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I STATEMENT OF THE PROBLEM 

The problem addressed in this research is one that has received 
the attention of many people. It is this: "Is it possible to develop 
computer programs which' operate upon written English so as to produce 
a form of representation that makes possible the generation of high 
quality indexes by automated means?** Automated indexing procedures 
have been largely relegated to the tasks of sorting , formatting and 
printing 9 while the substance of the index is manually derived. Thus 
the question is one o£ how to derive good, meaningful indexes from 
documents automatically rather than manually. 

Specifically, the problem i^ch this research addresses is the 
development of procedures which define the relational attributes to 
words in English text. These relational attributes are considered 
an essential part of any good index and therefore an antecedent to 
the production of good Indexes by automated means. 

The organization of this dissertation is as follows: 

Chapter I presents an overview of theories of language. 

Chapter II presents a theoretical framework for an empirical 
investigation of language. 

Chapter III presents procedures for the identification of 
relational attributes among words In English text. 

Chapter IV describes procedures for the identification and 
characterization 9 in relational terms, of clauses and phrases in 
text. 

XV 



Chapter V presents procedures for the assignment of case roles. _ 
These case roles amount to functional interpretation of text elements 
(e.g. 9 phrases). 

Chapter VI relates the language analysis procedures presented in 
earlier chapters to the notion of indexing and proposes a graphical 
representation of English text as a general base from which a 
variety of indexes may be automatically derived. 

Chapter VII contains a brief statement of conclusions and of 
direction for future research. ^ 

Finally, a KWIC index of all the references cited in this 
dissertation is included as a way of making parts of the dissertation 
accessible by other than sequential means. This index is given in 
Appendix F. 
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CHAPTER I. OVERVIEW OF THEORIES OF LANGUAGE 



A language is a set of principles relating meanings and 
phonetic sequences. 

R. W. Langacker, Fundamentals of Linguistic Analysis 

A language is any set of sentences over an alphabet. A 
sentence over an alphabet is any string of finite length * ' 

composed of symbols from the alphabet. An alphabet or 

vocabulary is any finite set of symbols. * 

J. E. Hopcroft and J. D. Ullman, Formal Languages 

and their Relation^ to Automata 

English, n., A language so haughty and reserved that few 
writers succeed in getting on terms of ^familiarity with 
it. 

A. Bierce, The Enlarged Devil's Dictionary 

1. Introduction 

Printed or written language consists of a set of elements called words . 
This set of words is the vocabulary of the language. A dictionary is a 
collection of vocabulary elements along with a description of each 
element. The permissible ways in which vocabulary elements may be 
strung together (1 > arranged in a linear sequence) is governed by a 
set of rules called a grammar . The resulting sequence may be called a 
sentence . The set of sentences permitted by the grammar constitute a 
language (1), (see Figure 1.1). 

While the derivation and definition of words Is of interest, 
especially to the lexicographer, etymologist, historical linguist and 
sociollngulst, only a small part of modern linguistic effort is directed 
toward the study of words. Most attention has been paid to the grammar. 
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Figure 1.1 Illustration of various concepts which comprise 
the notion of "language." 
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or grammars, of a language, for the most part because the grammars of 
natural languages are still largely intuitive and difficult to describe 
precisely. 

1. Traditional (Pedagogical) Grammar 

Grammar describes the relationships which words bear to one 
another in a statement. 

J. Moffat, The Structure of English 
Traditional grammar has been viewed as a model which dictates the form of 
language. A sentence construction was, within the context of this 
model either right or wrong, and grammar books contain the last word on 
the validity of a string of words as a sentence. The traditional 
grammar model has been handed down for centuries as something not to be 
questioned. Let us consider how this model came into being. 

The study of language can be traced back to the Greeks of the 5th 
century B.C., when language was studied from philosophical perspectives. ^ 
Language was believed to contain the universal forms of thought. 
Syntactic forms of the language were derived and were defined in 
accordance with the philosophical significance of each form. The Greek 
theories were developed through an investigation of the processes 
governing thought and action (2). 

The parallel between this model of grammar which developed at such 
an early period and our current traditional grammar is great. Here are 
some of the steps in the development of Greek grammar. Plato, in one 
of his dialogues, classified nouns and verbs on logical grounds. 
Aristotle identified nouns, verbs and conjunctions (although the use of 



the latter term differs from its current use). The Stoics added a class 
of articles and examined number, voice, mood and case (nominative, 
vocative, accusative, genitive and dative), Dionysios Thrax, an 
Alexandrian scholar of the 1st century B.C., recognized eight parts of 
speech: noun, verb, participle, article, pronoun, preposition, adverb 
and conjunction. Remmius Palaemon added the interjection as a part of 
speech in the first century A.D, This classical grammar survived in 
over a thousand manuscripts and it formed the basis of the Latin grammar 
recorded in the eighteenth century (3), 

In the process of writing down an English gramar, early grammarians 
were influenced by logic and by Latin grammar. To scholars of the 18th 
century a correct sentence had to be logically correct. Thus, a rule 
was put forth .that words like "perfect," "round" and "square" could not 
have. a comparative degree. If a thing is perfect, it can not be more 
perfect since one cannot reach beyond perfection. If a thit^g^ i s round, 
it is already round and cannot become rounder, A modern linguist would 
agree neither with the initial hypothesis nor with the outcome, 

Latin grammar was a major factor in the formation of an English 
grammar. The transference of Latin rules to English rested on the theory 
that all languages had a common structure. As a result, traditional 
grammar books refer to accusative and dative cases when there is no need 
for them in English; to gender in nouns when gender in English nouns is 
a part of their meaning; and to the restriction from placing a 
preposition at the end of a sentence when this is not always 
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possible (4) . While such attributes are Important (e.g., in the 
investigation of the derivation of English from Latin), these attributes 
are not relevant to the definition of a grammar of English. 



3. Contemporary Approaches to a Grammar of English 

The grammar of a language is a system of rules that determine 
a certain pairing of sound and meaning. It consists of a 
syntactic component , a semantic component and a phonological 
component . 

N*, Chomsky, Aspects of the Theory of Syntax 

Formally^ we denote a grammar G by (Vjj,Vi»,P,S) . The symbols 
Vjj, V^, P and S are, respectively, the variables, terminals » 
productions and start symbol . 

J. E. Hopcroft and J. D. Ullman, Formal Language s 
^ and their Relation to Automata 



While traditional grammar has apparently been adequate for 
pedagogical purposes, its inadequacies for other purposes became 
acutely apparent in the 1950*8 as a consequence of the futile attempts 
to automatically translate from one natural language to another* The 
main result of the vast amounts of money, time and effort that these 
attempts consumed was the conclusion that much more research was needed 
to examine basic structures and operators of language (5). 

Since that time, several theories of language have been developed, 
including theories of formal language, using the principles of proposi- 
tional calculus, group theory and automata theory* These theories. 



1. The frequent inappropriatenisss of this rule is illustrated by a 
statement attributed to Winston Churchill: ''That is an imposition 
up with which I will not put I" 



although often readily susceptible to implementation, deal primarily 
with restricted subsets o£ English or with /'aftifical*' languages and are 
therefore not applicable to English on a broad scale. Other theories, 
which are descriptive in nature and are broadly applicable to natural 
language have not been easy to implement because of the complexity of 
the procedures or because of the large size of the vocabulary of a 
language. Grammars based on theories in this category include the 
transformational*generative (6), dependency (7), immediate constituent 
(8), phrase structure (9), predictive (10), systemic (II) and 
stratified (12) grammars of language. Versions of some of these 
grammars have been implemented, and have provided valuable empirical 
data. However, their implementation on a practical scale, for 
use with a large data base of technical docimients where the time to 
handle a single sentence is important) seems impossible. The severity 
of the time limitation suggested here can be seen in an attempt by 
Winograd to implement a theory of language. While his work in this area 
is extensive, and his results important, the system he has developed 
deals with an extremely limited. universe of discourse (13), and its 
extension to some more extensive universe of discourse seems impractical. 

All of the theories mentioned here have unquestionably contributed 
to our understanding of language, but I believe they fail to account 
for the basic purpose of language and that they therefore fail to deal 



2. Everywhere in this dissertation the word implement (and its 

derivatives) means "to render in a form suitable for processing by 
a digital computer 
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wjith the fundamental properties of language. In the next chapter, I 
propose a theoretical framework for studying the English language. I 
believe this theory makes it possible to operate on English utterances 
in algorithmic terms and that the results of these operations will be 
useful in developing automated indexing procedures. 
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CHAPTER II. A THEORETICAL FRAMEWORK FOR AN EMmiCAL INVESnCAHON OF 
LANGUACE 



I want to put the case clearly before you, and I will therefore 
•hov you vhat I mean by another familiar example. I will suppose 
that one of you, on coming donn in the morning to the parlour of 
your house t finds that a tea*pot and some spoons which had been 
left in the room on the previous evening are gone, ••the window is 
open, and you observe the mark of a dirty hand on the window* 
frame, and perhaps, in addition to that, you notice the impfi^MM^ 
of a hob*nailed shoe on the gravel outside. All these phenomena 
have struck your attention instantly, and before two seconds have 
passed you say, *H)h, somebody has broken open the window, entered 
the room, and run off with the spoons and the tea*pot!" That 
speech is out of your mouth in a moment. And you will probably 
add, **I know there has; I am quite sure of it!** You mean to say 
exactly what you know; but in reality you are giving expression 
to what is, in all essential particulars, an hypothesis. Ycu do 
not know it at all; it is nothing but an hypothesis rapidly framed 
in your own nind. And it is an hypothesis founded on n long train 
of inductions and deduct iok\s. 
C T. H. .Huxley, Darwiniana 

1. Introduction 

A theoretical framework is essential to begin an experimental work. 
An hypothesis may be changed or discarded or new hypotheses may emerge 
in the course of an investigation, but only within a theoretical framework 
can experimental results assume general implications. 

A theory rests upon observation and is verified through observation. 
Thus in any study of language one must gather evidence which may enable 
him to hypothesise about causes and effects. When we ask questions such 
as **What is the function of the preposition *of* in English?" it is 
clear that we are already pretty far down a particular line of 
investigation having, apparently, observed such things as words, classified 
them (among other things) according to some scheme, and determined that at 
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least some words have particular functions which it is our desire to 
determine. 

While it is desirable that theories be as general as possible » it 
is rare that general theories emerge at early stages of an investigation. 
Rather theories are often devised for particular purposes and without 
any particular concern for generality. That is the case with the 
theoretical framework of English described in this Chapter. Prior to 
describing this theory, however, I feel it is necessary to make clear 
to the reader my views on the function and purpose of a theory. I 
attempt to do this in the following section. I hasten to point out that 
the remarks presented below are not to be construed as expressing any 
original thought on my part. On the contrary, these views are acknowledged 
to be the basis of scientific investigation (1). But scientists and 
laymen alike sometimes become so convinced of a theory, that the theory 
is accepted as a law. This acceptance has been the case in the treatment 
of the traditional model of grammar as an absolute. I therefore feel it 
is necessary to emphasize what Huxley has so admirably expressed in the 
quotation at the beginning of this Chapter concerning the nature of theory. 

2. Function and Purpose of Theory 

A theory is a way of representing, organizing and observing 
phenomena. It is a statement about the way in which one views and preceives 
pb omena (2). A theory is not purported to be an absolute. At one 
tiaie, people held to the theory that the universe revolved around the 
earth. Ptolemy, based on this view of the universe, developed a system 
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of epicycles to describe the movement of the universe. The mathematical 
formulations which he developed precisely described and predicted the 
movement of the planets. The theory is internally consistent and, even 
more impressive, experimentally verifiable. Now that man can look out 
from the stars, his view of the universe has changed and has necessitated 
different theories. But this does not destroy the consistency or precision 
of the earlier theory. And there is no guarantee, that at some future 
date, a new dimeuoion will be added to the observables we now comprehend, 
and new theories of the behavior of the universe will emerge. This 
example illustrates that the structure set down by a theory and the 
relationships postulated among the objects with which it deals is not an 
attribute of the objects themselves. On the contrary, a theory imposes 
some form of logic or structure upon observables and provides for the 
interpretation of the observables within this framework (3). 

For my purposes, a theory must not only be adequate in some abstract 
sense, it must also be useful. But how is utility to be measured? I 
can only say that for me a theory is useful so long as it helps toward 
the achievement of certain goals I have established for myself. Of 
course goals have a way of changing with time and thus so must theories. 
In effect, then, I judge a theory according to whether it serves me well 
in some investigation. 

Hhe main purpose of this chapter is to describe a theoretical 
framework for language which will serve for the analysis and description 
of language, at least of English, in terms that are considered beneficial 
in indexing, especially automatic indexing. This theoretical framework 
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has. permitted the expression of rules upon which algorithms have been 
built to effect various analyses of English. These algorithms are 
described in Chapters III-V, 

3. A Theoretical Framework for the Description and Analysis of Written 
English 

3.1. Introduction 

Language is a means of communication. But this sweeping statement 
is not particularly helpful* However, if one accepts English as a sub- 
set of language the subject matter to be dealt with is reduced to some- 
what more manageable proportions, thereby excluding body language, 
chemistry, mathematics, and other languages from consideration. But 
English may be written or spoken and it seems that the properties of the 
two are sufficiently different that one may further reduce the subject 
matter by limiting one's attention to written English. In particular, 
the language dealt with in this research is that of technical and literary 
works • 

3.2. The Basic Elements of Language 

Whatever language is being considered, it seems to me that it may 
be described as a system of things and relations between them . In 
general, what are observed day by day are things^. Furthermore, these 

3. An algorithm is a set of precise, unambiguous rules the application of 
which must produce results that are independent of the machine (or 
person) applying them (4). 

4. Simmons, et al . (5), use the term "concept' instead of thing, but I 
prefer "thing" as being potentially more general. 
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things exhibit changes with time* It is frequently found by observers 
to be useful or necessary to name the things and the changes they exhibit. 
For instance, when a rock is observed to fall from a cliff and when the 
observer wishes to communicate with sj^eone about the changes the thing 
has undergone, the object may be called "rock" and the change in its 
location with respect to time "fall."^ If one is near the terminus of 
the rock's path, one might say the rock "dropped," whereas if one is near 
the point from which it fell, one might say the rock "fell." The place 
from which the rock came (or departed) may also be named, for example, 
"cliff*" Hie important thing to note here is that names are given to 
things and to behaviors exhibited by (or, if you will, attributed to) 
the things* (Any observable change of state, no matter how slight, I 
shall call a behavior ) * It is also importnat to note that a great effort 
is made to differentiate between the two types of names. Thus, I argue 
that language has as its basic elements names oi things and names of 
behaviors. But there is a fundamental difference between the role and 
nature of names of things and of names of behaviors. Since things are 
often directly sensible and behaviors are never sensible, the former are 
treated as though they were related by means of the latter* In the 
example above, of the falling rock, the rock is directly sensible (by 
touch, smell, etc.) whereas "falling" is a name given to the change in 



5. I may surely be forgiven for using the language I am attempting to 

describe. Subsequently, a word or phrase given special meaning 

within the context of this dissertation will be printed in upper 
case letters* 
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location of the rock with respect to time* That is, "falling" is the 
name given to the relation between the rock and its successive locations 
in space at successive points in time* I hope this illustration serves 
to clarify the notion of name-of- things, which I shall call, simply, 
NAME and of name-of -behavior , which I shall call RELATION. 

Many researchers have in one way or another treated languages as 
relational systems. Rothstein has proposed the use of binary relations 
in representing strings of a language (6, 7); In a model of verbal 
understanding, Simmons (8) has defined primitive elements of his model 
to be concepts and relations ; these primatives are essentially equivalent 
to my thing and relation . And as will be seen subsequently, the views of 
Montgomery (9), Fillmore (10), Chafe (11), and others are compatible 
with the relational nature of English I put forward here. I must note 
that the relations of which I speak are of two types: the one having 
some referent in actual experience, the other being exclusively a 
linguistic device. This matter is dicussed below in Section 3.3.2. 

To summarize to this point, I have argued that a language consists 
of NAMES and RELATIONS. These names and relations are, in turn, strings 
of symbols formed from a basic set of symbols called an ALPHABET, 
Elementary alphabetical strings are called WORDs. A NAME is a word or 
string of words assigned to a thing, and a RELATION is a word or string 
of words assigned to a behavior. I have also said that a language 
provides for the production of more con^lex names or relations by 
combination of simpler names or relations. Let us now consider how this 
is done in English. 
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3.3. The Naming Process in Language 

"The name of the song is called 'Haddocks* Eyes*." "Oh, that's 
the name of the song, is it?" Alice said, trying to feel 
interested. "No, you don't understand," the Knight said, looking 
a little vexed. "That's what the name is called . The name really 
is 'The Aged Aged Man.'" "Then I ought to have said 'That's 
what the song is called'?" Alice corrected herself. "No, you 
oughtn't: that's quite another thing! The song is called *Ways 
and Means': but that's only what it's called, you knowl" 
"Well, what is the song, then?" said Alice, who was by this 
time completely bewildered. "I was coming to that," the Knight 
said. "The song really is 'A-sitting On A Gate': and the tune's 
ray own invention." 

Lewis Caroll, Through the Looking Glass 
While NAMES and RELATIONS denote basic eleraents of a language, it 
will usually be found that siraple NAMEs (single words) may be combined 
for naming many things or behaviors so that it is unnecessary to assign 
a unique narae to every thing or behavior. In other words, the basic 
vocabulary^ of a language will be found usually to be rather quickly 
extended to practical liraits'(of raeraory, essentially) and that to 
continue to name things and behaviors in a practical way, additional 
linguistic devices are required. For instance, there are many horses in 
the world and it would be cumbersome at best to have to assign a unique 
word, as a NAME, to each and every one, even though the language might 
provide the capability of doing 'so. Instead languages provide for the 
modification of basic NAMEs by permitting several NAMEs to be related to 
one another In special ways. A horse may be brown or black or large or 
fast or wild or docile. It may run or walk <Jr trot or gallop and these 
terms my be specified as fast or slow or fluid or jerky or stylish. My 
point is that by providing appropriately for combining siraple NAMEs to 



6. Recall the definition of vocabulary given in Chapter I. 
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form more complex ones» a language becomes more powerful without the 
burden of a huge vocabulary. The relations between the NAMEs may be 
explicitly represented or not. For example, the relation between 
"brown" and "horse" in the NAME "brown horse" is established, in English, 
by the positioning of the words, whereas explicit relational words are 
often used as in "house of wax, "where "of" serves this purpose. 
3.3.1. Classification of NAMEs 

A NAME may be simple, composite or complex. A SIMPLE NAME is a 
single word (string of alphabetical symbols exclusive of the blank) which 
is assigned to a thing. A COMPOSITE NAME is a consecutive sequence of 
SIMPLE NAMEs separated by one or more blanks. A COMPLEX NAME is an 
ordered triple, N^RN^, where ^^f^^ are SIMPLE, COMPOSITE OR COMPLEX 
NAMES, or are vacuous, and R is a RELATION. Examples of these three 
classes of NAME are given in Table 2.1. 

In order to establish a relation between the NAME types defined here 
and traditional linguistic terminology, the reader may note that, from 
the example of Table 2.1, SIMPLE NAMEs seem to correspond with nouns, 
COMPOSITE NAMEs with a noun preceded by a series of adjectives, and a 
COMPLEX NAME with no traditionally defined entity. It is emphasized 
that no effort has so far been made to investigate the function of NAMEs 
in English, save that they represent things. Nor has any mention been 
made of possible methods for identifying them in English utterances. 
These matters are dealt with later. 



17 

Table 2.1 Examples of SIMPLE, COMPOSITE and COMPLEX NAMES from 
English. 



SIMPLE 


COMPOSITE 


COMPLEX 


horse 


horse blanket 


he is a man 


trees 


pine cone 


a Lree oy une suream 


rock 


heavy black stone . _ 


_j[<ick8_in, his head 


wind 


cold front 


the edge of a hurricane 


water 


running water 


mountains shrouded in 






mist 


ocean 


^ite caps 




man 


computer programs 


Of Mice and Men 


history 


man's birth 




time 


space/ time continuum 




Mississippi 


Missouri River 
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3.3.2. Classification of Relations 

REIATIONs are of two kinds: primary and secondary. A PRIMARY REIATION 
is defined as a single word which is assigned to a behavior. PRIMARY 
REIATIONs are, in turn, of two types: dominant and recessive. A 
DOMINANT PRIMARY RELATION (DP-RELATION) is a PRIMARY RELATION which can 
serve in isolation to relate one or more names. A RECESSIVE PRIMARY 
RELATION (RP-RELATION) Is a PRIMARY RELATION which serves as an argument 
of another PRIMARY RELATION. SECONDARY REIATIONs are defined as special 
linguistic elements which have no referents In experimental behaviors 
but serve only to relate NAMEs to NAMEs or to relate RELATIONS to NAMEs 
or to REIATIONs. 

RELATIONS are also designated as simple or composite. A SIMPLE 
REUTION is a single PRIMARY or SECONDARY REUTION. A series of SECONDARY 
REIATIONs is a COMPOSITE SECONDARY RELATION (CS-RELATION) . A series of 
DP-REIATlONs or a series of DP-RELATIONs followed by a SECONDARY RELATION 
is a COMPOSITE DOMINANT PRIMARY REUTION (CDP-RELATION). A series of 
RP-RELATlONs or a series of RP-RELATIONs followed by a SECONDARY RELATION 
is a COMPOSITE RECESSIVE PRIMARY REIATION (CRP-RELATION). An RP-RELATION 
in Juxtaposition with a DP-RELATION cannot be subsumed within the DP- 
RELATION. Rather the RP-RELATION serves as a COMPLEX NAME or, as 
stated before, as an argument of the DP-RELATION . 

This hierarchy of RELATIONS is shown in Figure 2.1. Relations have 
some connection with traditional linguistic terminology. PRIMARY 
RELATIONS correspond roughly with verbs, and SECONDARY RELATIONS with 
prepositions and conjunctions. RELATIONS in English are considered to 
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function in the mathematical sense so that they take arguments just as 

do mathematical relations. It is important to note, too, that REIATIONs 

may connote various kinds of relationship between their arguments (such 

as equivalence, space/ time, etc.)* These connotations, though important 

In the broader aspects of linguistics, are not of interest in this work. 

3.3.3. Definition of '^Sentence" 

From the foregoing definitions a SENTENCE can now be defined. A 

SENTENCE is an ordered triple N^RN where N ,N are NAMES (SIMPLE, 

i j i j 

COMPOSITE, COMPLEX or vacuous) and where R is a PRIMARY RELATION (SIMPLE 
or C(»tPOSITF. never vacuous). If N^,N^ do not themselves contain PRIMARY 
REIATIONs, then this definition of ''sentence** is equivalent to Cook*s 
definition of ''clause"^ (12). I shall use Cook*s definition of ^'clause,'* 
so that "sentence** is superordinate to **clause .** 
3.3.4* Interregnum 

Let me give here an informal summary of the elements of the theory 
of language I am proposing* The basic elements of a language are things 
called symbols or alphabetical symbols > From these symbols, strings 
(linear sequences) of them may be formed which are called words . The 
set of such words Is called a vocabulary . Some of these words name 
things and I have called these words NAMEs. Some words name behaviors of 
things, and I have called these words RELATIONS. More complex NAMEs 
and RELATIONS may be produced by stringing together simpler NAMEs and 



7. Cook states that a clause is a string of words with one and only one 
predicate (in my terms predicate « PRIMARY RELATION)* 
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REIATIONs. A SENTENCE was defined at an ordered triple N^RN^ such that 
R is a PRIMARY RELATION. Given the special condition that the sentence 
contains but one PRIMARY RELATION, a CLAUSE was defined. In the next 
section I will show, for English, the way in which these various 
linguistic elements are eiiQ>loyed, in order to complete the theory. The 
various terms so far defined are summarized in Tkble 2.2. A decomposition 
of a sentence based on the concepts presented is given in Figure 2.2. 
3.4. Formal Statement of a Theory of Language 

In Section 3.3 I have defined and exemplified certain notions basic 
to the theory of language I propose. The way in which these basic 
notions are interrelated in language utterances must now be specified in 
order to produce a coherent theory of language (at least of English). I 
shall first consider the function of RELATIONS and then shall define 
successively larger aggregates of NAMEs and RELATIONS culminating in 
the definition of SENTENCE. 
3.4.1. PRIMARY RELATIONS 

In Section 3.3.2, REIATIONs were categorized as primary or secondary, 
simple or composite, dominant or recessive. The relationships between 
these categories of RELATION are illustrated in Figure 2.3. Some 
additional terminology is needed to facilitate further discussion of 
PRIMARY RELAnONs. COMPOSITE PRIMARY RELATIONS consist of a series of 
PRIMARY RELATIONS. Let us call the right-most REIATION a MAIN RELATION 
and the other REIATIONs in the series ALLIED REIATIONs. Thus, a PRIMARY 
RELATION may be characterised as 
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REIATION 




PRIMARY SECONDARY 




DOMINANT RECESSIVE 



Figure 2.3 Relationships between categories of RELATION. 
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(ALLIED RELATION)^ MAIN RELATION (n > 0) 

If n > 0, then the string constitutes a COMPOSITE PRIMARY RELATION. 

It will be convenient to further partition ALLIED RELATIONS into 
three classes: AUXILIARYs, MODALs and ADJUNCTS. These three classes 
are each ostensively defined as in Table 2.3. It can be seen that these 
three classes of ALLIED REIATION are quite similar to the traditional 
classes: auxiliary verbs, modal verbs and adjunct verbs. 

MAIN REIATIONs may be either dominant or recessive (this distinction 

does not apply to ALLIED RELATIONS). The purpose of this distinction 

will be made clear shortly. RECESSIVE MAIN (RM) RELATIONS are of two 

types: PARTICIPIAL or INFINITIVAL. PARTICIPIAL RM-RELATIONs are SIMPLE 

8 

REIATIONs ending in "ing/' except for those mentioned in Table 2.4. 

INFINITIVAL RM^-REIATIONs are classed as marked or unmarked . Marked 

INFINITIVAL RM-RELATIONs are those RM-RELATIONs which immediately follow 

the SECONDARY RELATION "to." All other INFINITIVAL RM-REUTIONs are 
9 

unmarked. 



8. The definition given of a PARTICIPIAL RM-RELATION must be distinguished 
from the traditionally accepted definition of participle. A 
PARTICIPIAL RM-RELATION is defined as the name of a behavior. Thus, 
the word "barking" in the phrase "the barking dog" is part of a 
COMPOSITE NAME, not a PARTICIPIAL RM-RELATION. 

9. As will be seen, this is not an adequate definition of unmarked 
INFINITIVAL RM-RELATIONs , for analytical purposes. No satisfactory 
general definition has so far been round. 
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Table 2.3 Classes of ALLIED RELATION and their Elements. 



AUXILIARY 


MODAL 


ADJUNCT 


am 


can 


did 


are 


could 


do 


be 


may 


does 


been 


might 


keep 


being 


must 


kept 


is 


shall 


get 


had 


should 


got 


has 


will 


let 


have 


would 




having 






was 






were 







10« It is emphasized that the elements listed may also be found to 
be elements of the set of MAIN REIATlONs; the three classes are 
mutually exclusive only within the framework of ALLIED RELATION. 



Table 2.4 REIATIONs ending In "Ing" which are not 
PARTICIPUL RM-REIATIONs. 



bring ring cling string 

ding sing ping swing 

fling spring bing wing 
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The complete hierarchy of PRIMARY RELATIONS is given in Figure 2*4. 
3.4.2. SECONDARY REUTIONs 

As stated earlier, SECONDARY RELATIONS are special linguistic 
devices which have no extralinguistic referents* However, SECONDARY 
REIATIONs play an important role in gaining an understanding of English. 
Therefore, it is necessary to make a number of subdivisions of SECONDARY 
REIATIONs. This is done in the following section. Some initial indication 
of the functionings of SECONDARY RELATIONS is also introduced at this 
time. 

SECONDARY RELATIONS are initially divided into two classes: 
CONJUNCTIVE and ATTRIBUTIVE; these two classes are discussed, in turn, 
below* 

3.4.2-1. The CONJUNCTIVE SECONDARY RELATIONS 

CONJUNCnVE SECONDARY RELATIONS (CSR) are categorized according to 
function. The categories are named COORDINATE, SUBORDINATE, ADVERBIAL, 
NOMINAL and ADJECTIVAL. Each of these categories is defined in the 
following paragraphs. 

There are just five COORDINATE CSRs: 



and 



not 
but 



nor 



or 



SUBORDINATE CSRs are: 



if 

than 
then 
since 



however 
therefore 
though 
yet 



although 
thus 
whether 
unless 



1 
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Seven elements make up the set of ADVERBIAL CSRs: 



wher'-i from to 

where by 

in for 

The following words constitute the set of NOMINAL CSRs: 

what whoever that 

wh ich whomever whom 

why whatever of 
how 

Finally, the ADJECTIVAL CSRs are: 

who when of 

whom that 

where whose 



Obviously these classes of CONJUNCTIVE SECONDARY REIATIONs are 
neither mutually exclusive among themselves nor with respect to other 
classes of SECONDARY RELATIONS (see below). The means of distinguishing 
between the classes given a particular element will be dealt with in 
Section 3.4.3. First let me treat ATTRIBUTIVE SECONDARY REUTIONs. 
"3. 4. 2. 2. The ATTRIBUTIVE SECONDARY RELATIONS 

ATTRIBUTIVE SECONDARY REUTlONs (ASR) are listed below. 



of 


through 


above 


to 


down 


across 


in 


between 


outside 


for 


tinder 


except 


with 


off 


beyond 


on 


during 


ins ide 


at 


without 


instead 


by 


around 


throughout 


from 


upon 


despite 


out 


until 


about 


up 


toward 


into 


over 


among 


below 


after 


within 


according 


before 


along 


behind 
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ASRs serve either an ADJECTIVAL or an ADVERBIAL function. ADJECTIVAL 
ASRs are "o£" and those ASRs which follow a DOMINANT RELATION and 
precede a SIMPLE or COMPOSITE NAME. All other ASRs are ADVERBIAL. 

The hierarchy of SECONDARY RELATIONS Is shown In Figure 2.5. 
3.4.3. NAMES 

The concept NAME was defined In Section 3.3. It will be convenient 
to expand upon this concept both to provide additional useful terminology 
and to make clear certain functional distinctions between SECONDARY 
RELATIONS as alluded to above. 

3.4.3.1. The Notion of PHRASE 

The teem PHRASE will be applied as follows. A SIMPLE or COMPOSITE 
NAME will be called a NOMINAL PHRASE. Thus, any element In either of 
the first two columns of Table 2.1, for example, amounts to a NOMINAL 
PHRASE. 

If either a SIMPLE or COMPOSITE NAME Is Immediately preceded by 
(i.e.. Is an argument of) an ATTRIBUTIVE SECONDARY RELATION, and the 
REUTION/NAME NOMINAL Is called a SECONDARY PHRASE. 

A COMPOSITE PRIMARY RELATION Is called a PRIMARY PHRASE. 

A hierarchical arrangement of PHRASE types Is shown In Figure 2.6. 
Examples of them are given In Table 2.5. 

3.4.3.2. The Notion of CLAUSE 

A COMPLEX NAME containing one and only one PRIMARY RELATION Is a 

V 

PRIMARY COMPLEX NAME or CLAUSE . There are four basic types of CLAUSE: 
PRINCIPAL, NOMINAL, ADVERBIAL and ADJECTIVAL, related as illustrated in 
Figure 2.7. 
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SECONDARY 
REIATION 



COORDINATE 



COHJUNCnVE 



ATE / \ ADJ 




SUBORDINATB 



ATTRIBUTIVE 



ADJECTIVAL 



ADVERBIAL 




ADJECTIVAL 



NOMINAL 



ADVERBIAL 



Figure 2.5 Hierarchical partitioning of SECONDARY RELATION. 
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Table 2.5 Examples of HAMEs, PHRASEs and CUUSEs 



Tern 


Illustrative Example 


COMPLEX NAME 
COMPOSITE NAME 
SIMPLE NAME 


the book on the table 
the thick blue book 
book 


NOMIHAL PHRASE 
PRIMARY PHRASE 
SECONDARY PHRASE 


the thick blue book 

haa been gone 

in the thick blue book 


ADJECTIVAL CUUSE 
ADVERBIAL CLAUSE 
NOMINAL CUUSE 
PRINCIPAL CLAUSE 


the sirl sitting in the chair 
the birds flew where the weather was warm 
flyinft planes can be danserous 
the musician became proficient by 
practicing every day. 
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3.4.4. SENTENCE 

Thus far, I have written at length about NAMEs and RELATIONS, and I 

have given some Indication as to the vay^^ these basic language elements 

are combined to form more complex units. But It still remains to consider 

tne vays^^ In which these units are combined to produce the fundamental 

12 

element of written English: the SENTENCE. 

Traditionally, a sentence has been thought of as a string of words 
expressing a complete thought (13). More recently, a sentence has been 
viewed as a string of words containing at least one predicate (14), or 
as any string of finite length composed of symbols from an alphabet (15). 
In this research, the simplest form of sentence has been the CLAUSE. If 
one considers that a REIATION demands a certain number of NAMEs as 
arguments and that these arguments may themselves contain other RELATIONS 
and their arguments. It Is easy to see that English Is a relational 
language after the notions of Rothsteln (16). 

I would like to think of SENTENCE as a NAME. Of course, SENTENCES 
may be of varying con^> laxities, ranging from Just a single PRIMARY RELATION 
to strings containing many NAMEs and RELATIONS. But che problem Is to 
specify the way In which relatively simple language units may be combined 
to yield SENTENCES. Prom my point of view. It will suffice to show that, 

11. But not they are so combined. 

12. Communication through written English does employ yet higher 
aggregations of linguistic units, but I shall not attempt to 
Incorporate them Into my theory of language at this time (see, 
however, Strong (17)). 
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given a string of English words » rules can be devised within the frrme- 
work of Che theory of language I have so far presented that permit the 
identification of the linguistic units specified within the theory and 
I .Att as a consequence^ permit one to say whether the given string is or 
is not an English sentence. 

Since t within the theoretical framework I have postulated^ all 
types of relations have been identified and defined^ I shall define 
SENTENCE quasi-ostenslvely through specification of the argument (s) 
each type of REUTION may take. The list of REIATIONs to be dealt witU 
is given In Table 2.$^ together with the range of arguments each may 
take. In order to express the REIATION/argument In a simple way I shall 
adapt Grlswald's prefix notation (18) for my purposes. A few words are 
in order concerning this notation. 

Xn prefix notatloi!^, the operator (relation) occurs firsts followed 
by the arguments which it relates. Thus^ the expression A -f* B would 
become 4AB. The number of arguments ^suoclated with an operator is 
specified by a numeral subscripted to the operator. Thus» the previous 
expression becomes -f^^AB. To avoid ambiguity in the application of 
qperators^ a precedence is assigned to them (i*e. » they are given a 
priority for first application). 

For my purposes^ 15 operators are enqployed^ correspondinsr to the 
REIATIONs already defined. Table 2.7 lists the operator symbols which 
I shall use» together with their precedence and the RELATION to which 
each symbol corresponds. Each sentence type defined within the theory ^ 
I propose is given in Table 2.8 together with an example English 
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Table 2.6 Definition of Prefix-Notation Symbols as Used in the 
Specification of SENTENCES* 





NUMBER OF 




1 1 


SYMBOL 


ARGUMENTS (n) 


PRECEDENCE 




a 


0 < n < 3 


2 


MAIN PRIMARY RELATION 


b 


0 < n 2 3 


2 


ATTV AnH \7PR ot- ATTV qm^I vrnn 

AUA oiiu vtvo or AUA anQ rlUU 




0 < n < ^ 


o 
Z 


AJN and VRB 


d 


0 < n < 3 


2 


AJN 


e 


0 < n < 3 


2 


PTC 


f 


0 < n < 3 


2 


INF 


g 


0 < n < 3 


2 


AUX and INF or MOD 
and INF 


h 


n = 2 


1 


ADJECTIVAL ATTRIBUTIVE 
SECONDARY (AJAS) RELATION 


i 


n = 2 


3 


ADVERBIAL ATTRIBUTIVE 


j 






SECONDARY (AVAS) RELATION 


n > 2 


4 


CCP 


k 


n > 2 


1 


CCN 


1 


n = 2 


4 


SCN 


m 


n = 2 


3 


ADJECTIVAL CONJUNCTIVE 
SECONDARY (AJCS) REIATION 
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ADVERBIAL CONJUNCTIVE 


0 






SECONDARY (ADCS) RELATION 


n = 0 


1 


NOMINAL CONJUNCTIVE 
SECONDARY (NCS) RELATION 










'VERB' 
'word' 
N 






Any from of specific verb 






A specific lexical entry 






An upper case letter 
represents a NAME 
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sentence for illustration. The following discussion should clarify the 
definition of SENTENCES in these terms. Consider the SENTENCE 
The equipment in this room is used for testing compounds. 

A h B b n e « 

2 1 2 I ° 

Each SIMPI£ or COMPOSITE NAME is represented as" a unique upper-case 
letter. 



The SENTENCE contains four REUTIONs: "h^" is an AJAS-REUTION 
With 2 arguments, "b^" is a DP-REUTION with 1 argument, "n " is an 
ADCS-REUTION with 2 arguments and "e^" is a RP-REUTION. lo 
formalize the SENTENCE, begin with the highest precedence operator, 
i.e.. the AJAS-REIATION "in." which relates "A" and "B'= to form the 
COMPLEX NAME "h^AB." The operator applied next is the PRIMARY RELATION 
operator. Since there are two. simply apply the operators in a left- 
to-right manner. "The equipment is use " becomes the COMPLEX NAME 
"^1^2^" "testing compound^' becomes the COMPLEX NAME "e^C." 
Finally, apply the ADCS-REIATION operator to obtain the COMPLEX NAME 
b.h ABe.C 
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CHAPTER III. IDENTIFICATION OF SIMPLE NAMES AND REUTIONS 

People studying a foreign language always worry about 
vocabulary. They will stare incredulously at a teacher 
who tells them that vocabulary is not the most important 
part of learning a foreign language. Occasionally a 
student may understand why this is true, but only after 
he has laboriously looked up all the words in a passage 
and still finds that he can make no sense out of the 
assembled words. 

Ann Eljenholm Nichols, English Syntax 

1. Introduction 

In the preceding chapter a theoretical framework was developed 
within which language was viewed as a relational system. Thus, two 
classes of linguistic element were demanded: NAMEs and RELATIONS. 
In this and succeeding chapters I shall attempt to show the usefulness 
of this theory as a guide in developing algorithms which identify 
and label various components of an English text. In this chapter, 
an algorithm is described which classifies words into several classes 
of NAME or RELATION according to the structural properties of the 
(usually) sentence examined. Before describing the algorithm, however, 
a review of some related work is presented in order to provide some 
context within which to view the present research. 

2. Overview of Related Work in Syntactic Analysis 

Syntactic analysis has been studied by many groups of researchers. 
In this section are reviewed those studies in syntactic analysis most 
pertinent to the present work. The section is divided into two parts. 
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The first part contains descriptions of four computer programs for 
syntactic analysis which, although rather large and complex, are well- 
grounded theoretically. In the second part are described six programs 
which bear especially close similarity to the program developed In this 
research. 

Only superficial comparisons among these programs can be made. 
Each program has been developed to identify a unique set of granmatical 
classes; each has been designed based on « utJique set of objectives; 
each has been implemented using a different programming language, and 
on a different computer. Furthermore, the accuracy of the results 
produced by these programs is not always available. Thus, these brief 
suonaries are written to give the reader an idea of what has been 
accomplished and of the obstacles which remain to the improvement of 
algorithms for syntactic analysis. 
2.1. Review of Syntactic Analysis Procedures— Part I 

The four studies presented in this section give the reader a general 
overview of much of the research into syntactic analysis carried out to 
date. The procedures developed in each of the four projects have the 
common attribute of being quite large and complex. In each procedure 
a large lexicon (dictionary) is employed, and it will be seen that 
the procedures were developed without any attempt at a distinction between 
structural properties of a language and the Intenslonal properties of 
the language. The nature of the procedures also necessitates that the 
lexicon be exhaustive. And in general, these procedures produce all 
possible analyses of a sentence from which one or more preferred ones 
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may be selected, of manually. 

All of the programs assign words to syntactic classes, but 
because of the nature of the grammar upon which the programs are based, 
word-class assignment Is Inextricably linked to phrase-group and 
clause-group assignment. For this reason, some of these programs will 
be mentioned again briefly In the succeeding chapter in which clause 
and phrase analysis Is discussed. Hie Investigations discussed In this 
section are those headed by Kuno and Oettlnger, Zwlcky, Sager, and 
Wlnograd. 

2.1.1. The Multiple-Path Syntactic Analyzer 

A well-known program for syntactic analysis is the Hultlple-*Path 
Syntactic Analyzer developed by Kuno and Oettlnger (1). This system 
is based on a context-free grammar of 3400 rules aM a top-to-bottom 
(top-down) analytical procedure employing a pushdown store. The 
dictionary used by this procedure gives a highly refined division of 
syntactic classes. For example, **are" belongs to three syntactic 
classes: one when used .^s an intransitive verb, one when used as a 
finite copula and one when used as an auxiliary to another verb. Each 
possible analysis of the sentence is explored in a left- to-right manner 
and verified or invalidated by the context-free grammar. The production 
of multiple analyses is useful for research purposes, but it is a 
decided disadvantage in practical application. Processing time is not 
directly dependent upon the length of a sentence but depends primarily 
on the number of possible surface structures which the sentence can 
generate (2). As an example, one 35-word sentence was reported to 
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require 12 minutes for analysis (3). 

2.1.2. Syntactic Analysis for Transformational Grammar 

Zwicky (4) has included syntactic analysis in an attempt to implement 
a transformational grammar for sentences. The first phase of the 
program is a phrase-structure grammar, which, at present, handles a 
subset (32 rules out of a possible 134 rules have been implemented) of 
English. The initial step in Zwicky 's procedure consists of a dictionary 
look-up of each word in a sentence. The dictionary contains all possible 
syntactic classes for a word, along with attributes such as tense, 
transivity, and number. This lexical entry also represents an eutry in 
terms of more abstract elements, for example, "none' is defined as "neg 
any." After all possible syntactic classes for a word in a sentence 
have been retrieved, all possible analyses of a sentence are examined. 
The sentence 

Can the airplane fly. 

has 15 possible surface structures, since "can" has 5 entries, airplane 
hao 1 entry and fly has 3 entries. In the next step a context-free 
grammar is applied to each of these analyses, and some structures are 
£ iminated. Transformational rules are applied n-xt in an effort to 
eliminate the spurious surface structures. The grammar is capable of 
analyzing sentences with more than one embedded clause. No discussion 
of the lexicon and of the rules used in both the context-free grammar 
and the transformational grammar is given. Although the process is 
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18 

only partially implemented, " Zwicky concludes that highly efficient 
routines are needed to obtain the correct surface structures (6). 
2.1.3. Syntacti<: Analysis Based on String Analysis 

In a project headed by Naomi Sager a string analysis grammar (7) 
is cae basis of a syntactic analysis program. In this analysis, a 
sen-.ence is viewed as a set of elementary strings (clauses). Modifiers 
and prepositional phrases are defined as adjuncts. Thus, the sentence 

Cars without brakes cause accidents* 

is described by the elementary string - N tV N (i.e., noun, tensed verb, 
noun) and an adjunction class P N (preposition, noun). Rules are 
developed for the analysis of relative clauses and of clauses joined by 
coordinate conjunctions. The program utilizes a dictionary which 
identifies all possible syntactic classes of each word. In the first 
step of the program, the possible syntactic classes of each word in the 
sentence are retrieved. The assignments made during this step are: 

Cars without brakes cause accidents. 
N P N/tV N/tV/V N 

In the next phase of the program, the syntactic categories are 
examined in a left-to-right manner, while the grammar rules for elementary 
strings and adjuncts are matched to the possible analyses of the 
sentenced If a grammar rule fails to match up correctly with the 



18. The lexical look-up and context-free parsing steps have not been 
programmed (5) . 
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sentence, the program backs up to the initial word of an adjunct or 
string and attempts to apply a different rule. Sager estimates that an 

i 

adequate grammar for English could be accomplished with about 150 rules/ 
plus another 150 restrictions. These restrictions analyze elements of 
the elementary string and check for such things as subject-verb agreement 
and well*formedness . No discussion is given of the timings,, storage, 
the accuracy of the program or of the rules which have been developed (8). 
2.1.4. Syntactic Analysis for the Simulation of Natural Language Processes 

Winograd has recently published a system design to explore how humans 
process language (9). The system includes procedures for both syntactic 
and semantic analysis. The syntactic analysis procedures are based on 
a systemic theory of grammar (10, 11). Although the theories of this 
grammar have not been stated in a unified way, the emphasis of th?s 
grammar is on the "informational units" (12) of a language. Winograd 
interprets these "informational units" as amounting to clauses and phrases. 

In his procedures for syntactic analysis, Winograd has defined 18 
word classes, 4 group types (noun, verb, adjective and preposition) and 
2 clause types. The 18 word classes are a finer division of the traditional 
word classes. The two types of clauses are denoted as major and 
secondary. A major clause is either imperative, declarative or a question. 
Associated with each of these units is a set oj attributes. Every unit 
belonging to a word class, group or clause is assigned a subset of these 
attributes. For example, a verb (e , "began") may have the attributes 
"past," "infinitive;" a preposition group (e "in the kitchen") may 
have the attribute "locational object"; and a c luse may have attributes 
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such as "transitive," "passive" and "causality*" 

The components of the syntactic analysis process are a dictionary, 
a context-free grammar and a push-down list (PDL). The dictionary 
contains the vocabulary of the language, along with each entry's 
syntactic class and attributes. The program which implements the 
context-free grammar contains several functions which give the grammar 
context-sensitive aspects. As an example, one function checks for 
agreement between subject and predicate. A bottom- up parser is used 
to apply the context-free grammar. 

This process appears to represent a viable system for the context 
in which it is used. While the universe of discourse for the system is 
small, Winograd has succeeded in developing a system which has the 
ability to synthesize sentences and which serves as an interesting tool 
in the investigation of question-answering systems (13). 
2.2. Review of Syntactic Analysis Procedures --Part II 

Bie investigations presented in this section require neither 
exhaustive dictionaries nor particularly complex procedures. Most of 
these programs have been based on ad hoc rules rather than on a well- 
developed theory of language. The studies presented here are those 
headed by Clark and Wall, Klein and Simmons, Stolz, et al., Thome, et 
al., Resnikoff and Dolby, and Woods* 
2.2.1. The Economical Parser 

In this study a program was written which assigns words to grammatical 
classes, identifies phrase types and marks clause boundaries (14). The 
program first performs a dictionary look-up. The dictionary consists of 
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about 1000 entries* The entries include function words, inflectional 
endings, and a list of words which are exceptions to regular 
inflection (i , "thing" is not a verb even though it ends in "ing"). 
Words which are not found in the dictionary are assigned an ambiguous 
noun/verb category* In a second pass, phrase boundaries are tentatively 
identified. In the third pass clause boundaries are identified and 
clauses are tested for well-formedness* If a clause does not contain 
a verb, noun phrases are examined in a left-to-right manner and the 
first word which has been assigned to the noun/verb class is identified 
as a verb. Nine types of phrases and eight kinds of subordinate and 
relative clauses are identified. The algorithm was applied to abstracts 
of technical material and is reported to attain 91% accuracy in the 
identification of grammatical classes and 91% accuracy in the 
identification of phrases. The processing time is given as 24 words per 
minute. The algorithm was written in COBOL and executed on the IBM 
7094 (15). 

2.2.2» The Computational Grammar Coder 

Klein and Simmons have implemented a Computational Grammar Coder 
(CGC) which assigns words in English text to the appropriate grammatical 
class. The CGC is the initial phase of a syntactic analyzer which is 
part of a question answering system. The CGC contains two types of 
dictionaries: 1) a function-word dictionary containing about 400 words, 
and 2) two dictionaries containing those nouns, verbs, and adjectives 
which are exceptions in various suffix texts* For example, while most 
words ending in "ing" are verbs, the word swing may be a noun. These 
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dictionaries contained about 1,500 words* 

The algorithm of the CGC begins by putting each word through a 
series of independent tests. These tests include a function word test, 
a capitalization test, a numeral test, and a series of suffix tests* 
Each test may result in the assignment of a set of codes to the word* 
If a particular test yields no information about a word, the system 
assumes that the classes noun, adjective, verb are possible. A final 
set of codes is obtained by taking the intersection of the set of codes 
assigned in each test. 

After each word in a sentence has been identified in this manner, 

the context*frame test is made. This test sequentially processes strings 

of ambiguously coded words which are bounded by uniquely coded words. 

Every possible combination of codes of an ambiguously coded string is 

checked against a context *)'riad*frame table \Aich contains pen^issible 

combinations of codes in such strings. When one of the sequences of 

codes of such a string is found in the context triad frame table, then 

this unambiguous sequence of codes is assigned. An ixample will clarify 

this process. Consider the sequence of codes: 

adjective noun 
article verb adjective verb 

The first word of the string is an article, the second could be either 

an adjective or verb, the third either a noun or an adjective, the last 

is a verb. The following sequences of codes could be assigned to the 

ambiguously coded words: 
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adjective 
adjective 
verb 
verb 



noun 

adjective 
noun 

adj ective 



The context- triad-* frame table contains the following possible codes 
for two words which occur between an article and a verb: 



The only sequence of codes common to the test string and to the table is 
ADJ - NON; thus, these are the codes which would be assigned. 

This phase of the CGC is limited by two factors: 1) a string 
containing more than three ambiguously coded words cannot be handled 
and 2) out of 2,700 potential table entries, only 500 entries are 
included in the context- triad- frame table. The table entries were 
empirically derived by analyzing a chapter from the Encyclopedia 
Americana > The 500 entries included in the table accounted for 90% of 
the sequences of codes which were observed to occur in the test data (16). 

The CGC, which was written in JOVIAL for an IBM 7090 uses just 
under 14,000 computer words. In tests using scientific text, the system 
correctly identified approximately 90%^^ of the words (17). 



19* This figure refers to the c \ct identification of one of 30 
classes: adjective, adverb, * oun, verb, the verb to be, two 
classes of auxiliary verbs, articles, four classes of conjunctions, 
four classes of prepositions, and nine kinds of pronouns, 
punctuation, and verbs identified by inflectional endings. 



adjective 
noun 



noun 
adverb 



noun 



noun 
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2.2.3. The WISSYN System 

The WISSYN system, designed to make grammatical class assignments 
to words in English text, was developed by Stolz, Tannenbaum and 
Carstensen (18). WISSYN contains dictionaries which are similar to 
those used in the CGC (Section 2.2.2.). The function-word dictionary 
contains about 300 words. If a word in a sentence is not found in the 
dictionary of function words, it is checked against a dictionary of 
suffixes. The word must also be checked against a list of about 60 
words which are exceptions to the suffix tests. 

A third phase attempts to resolve the ambiguity of certain 
function words. For example the word "that" may be used in the following 
ways : 

that dog jumped 
the dog that jumped 
In this phase, a set of fras:;2s similar to those implemented by Klein 
and Simmons is used to resolve residual ambiguity. 

A final phase of WISSYN uses the statistical frequencies of 
structural patterns of English sentences to assign grammatical classes. 
For this phase, probabilities of . individual word classes occurring in 
a particular structural pattern were calculated by manually analyzing 
English text. The operation of this phase can be easily understood by 
considering an illustration. Given a sentence whose elements have been 
identified as: 

T D N X, P D N P T 
1 2 3 

where T is a terminal marker, D is a determiner, N is a noun, P is a 
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prcnoun, and X is the ith unidentified word (i > 1). When an 
i 

unidentified word is encountered, the probabilities of all strings 
consisting of four or fewer words surrounding the unidentified word are 
considered. For the above example, the following strings would be 
considered: 

T D N P 

D N X P D 
1 

, N Xj^ P D N 
X^ P D N P 
T D N X 

1 

D N X 

1 

N X 

1 

X^ P D N 

X P D 
1 

D N X P D 
I 

Since the longest strings provide the most reliable probabilities, only 
the three longest stri. gs are actually used. Thus, the probability of 
Xj^ being a noun, verb, adjective, or adverb would be calculated by 
using the following statistics: 

CONDITIONAL PROBABILITIES 

Predictor NOUN VERB ADJ ADV 

T D N X ,046 .819 .013 .122 

X P D N .438 .359 .068 .135 

D N X P N .017 .591 . ,.078 .-314 

jonrrpfeoDucT .00034 .17377 ' .00007 .00517 
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The element X^^ would be designated as a verb , since this was the most 
probable case. The probability tables used include only the 150 most 
frequent predictor configurations. When a particular configuration is 
not found in the table, then the table is searched for the next longest 
string. 

In a test of literary, scientific, and newspaper articles, an 

20 

accuracy as high as 93% was attained. WISSYN has been implemented 
on the CDC 1604 in CDC FORTRAN 60. The program, which is contained in 
approximately 6500 (48-bit) words of storage, can process about 2500 words 
per minute (19). 

2.2.4. Syntactic Analysis Based on a Regular Grammar 

Thome, Bratley, and Dewar have written a syntactic analyzer as 
part of a system which assigns the deep structure and surface structure 
to English sentences (20). In this syntactic analyzer, 5 types of 
sentences, 6 types of clauses, and several other syntactic categories 
including, gerund subjects, active verbs, modifiers and indirect objects 
are identified. 

The analyzer employes a dictionary of fewer than 200 words. It 
contains function words (referred to as closed-class words), verbs, 
suffixes and exceptions to these suffixes. The analyzer is based upon 
a form of transformational grammar. The op^atioa of the system is 
as follows. First, all of the function words in a sentence are identified. 



20. Words were classified according to the 14 categories: nouns, verbs, 

adjectives, adverbs, pronouns, determiners, linking verbs, auxiliaries, 
intensifiers, prepositions, relative pronouns, subordinators , 
connectors, negatives, preverbs, and exclamations. 
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Based on these function words and on the information provided by the 
grammar as it relates to the sentence, a set of predictions is made. 
These predictions are tested, and based upon the results of these tests, 
new predictions are made for successive parts of the sentence. All of 
the possible predictions are tested, and every prediction which holds 
produces a distinct analysis for the sentence. 

In some sentences none of the predictions which are made are 
satisfied. This happens when the sentence is improperly constructed or 
because the grammar is not complete. In this case, a program produces 
a message indicating chat the analyzer has failed for this sentence. 

This system was implemented on the KDF-9 computer. No figures 
were given for the amount of storage required by the program nor for 
the accuracy of the results which were obtained. While no average 
processing times were cited for the operation of the program, several 
examples of sentences which had been analyzed were given, along with the 
time needed to process them. The example which had the longest processing 
time (2.285 sec) contained 9 words, while the example which had the 
shortest processing time (0.427 sec) contained 1 word (21). 
2.2.5. A Limited Program for Syntactic An alysis 

Resnikoff and Dolby have written a ptugram for grammatical class 
assignment which consists of fewer than 100 COMIT instructions (22). 



21. Although these authors state that the grammar employed is a context- 
free grammar (CFG), these predictions effectively transform it into 
a context-sensitive grammar (CSG). This CSG appears to be similar 
to that developed by Kuno and Oettinger. 

ERIC 
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This program utilizes a dictionary of 200 function words and 200 affixes. 
At the time their paper was published, they intended to expand the 
dictionary to aboL. 1000 words. In preliminary tests on texts which 
include parts of Ulysses by James Joyce and a New York Times editorial, 
the results were reported as being "evidently high" (23). 
2.2.6. Syntactic Analysis Based on Transition Network Graronars 

Woods has developed a grammar described in terms of an augmented 
transition network. A recursive transition network is a directed graph 
with labelled states and arcs. The labels on the arcs may be state 
names or terminal symbols. An arc labelled with a state name produces 
the following action: the state at the end of the arc is stored in a 
pushdown stack and control passes to the state indicated by the arc. 
Control is passed back to the first state by popping the stack. This 
network is augmented by adding to each arc of the transition network an 
arbitrary condition which must be satisfied in order for the arc to be 
followed and a set of structure-building actions to be executed if th*^. 
arc is followed. The algorithm accepts as input words of a sentence 
along with the grammatical class of each word of the sentence. The 
algorithm produces a labellid bracketing of the phrases of the sentence, 
an identification of the subject, object, verb and an assignment of the 
sentence type (e.£. , interrogative or declarative). The transition 



22. In order to determine the correct grammatical class of a word, two 

criteria were used: first, the classification noun is used to signify 
both nouns and adjectives, and second, the correct classification 
must correspond to that given by either the Oxford Universal 
Dictionary or the Merriam-Webster Dictionary . 
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network model requires about the same (or more) processing time as 
predictive analyzers require .23 accuracy that the augmented trans- 

ition network attains Is not available (24). 
2.3. Suroroary 

In the reviews presented above of research concerned with grammatical 
class assignment^ two facts emerge. Firsts the more economically 
feasible approaches to grammatical class assignment have treated language 
as a relational system (at least Implicitly). Second^ the more successful 
approaches have avoided analytical procedures which depend upon large 
lexicons for much of their Information. 

The purpose of the work described In this chapter was to test the 
hypothesis that grammatical class assignment could be effected solely on 
the basis of a knowledge of the SECONDARY REUTIONs and certain PRIM/IRY 
REIATIONs contained In a sentence and of the absolute position of each 
word in the sentence and of the position of each word relative to the 
RELATIONS surrounding the word. 

3. K Basis for Automated Syntactic Analysis 

In the Identification of NAMES and REIATIONs » one must begin at the 
word level of analysis. If we suppose that an analytical procedure 
accepts as Input a continuous string of English text, then as a first 
step^ the Individual words of that text must be Identified. This 
Identification requires a definition of HORD which I present In Section 



23. woods Is apparently referring to the Multlple-Pnth Syntactic 
Analyzer of Oettlnger and Kuno. 
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4*2. Once the Individual words have been Identified, the next step Is 
to categorize them In some prescribed way. Traditionally, the categories 
have been the grammatical classes or parts of speech. Such categories 
have been used In this research, but It Is Important to observe that 
their definitions are different from the traditional ones. Before 
presenting these definitions, a brief review of earlier work In this 
regard Is presented (Section 3.1«). 

Upon categorisation of the words In an English text. It Is then 
possible to use a knowledge of these categories and of their sequential 
ordering in the text co produce larger aggregates (l*e«, either 
COMPOSITE or COMPLEX NAMBs)* 

The categorization of words Is treated in this chapter* The 
aggregation of these categories Into more complex units Is dealt with In 
Chapter IV* 

3.1* Defining Grammatical Classes 

The words which conq[>rise the vocabulary of. a language are traditionally 
classified Into eight parts of speech or grammatical classes. Dlonyslos 
Thrax Is credited with flr'^t proposing eight parts of speech: noun, 
verb, participle, article, pronoun, preposition, adverb and conjunction 
(25). The Interjection Is often added to this llst« For English, 
J. Priestly classified words Into the above eight parts of speech In 
1761 (26)* Some parts of speech are given definitions based upon their 
lexical meaning . For Instance, a noun Is defined as the name of a person, 
place or thing. On the other hand, an adjective Is defined as a word 
which modifies a noun. Such a definition Is based upon function ♦ 
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Gleas<m (27) has suggested an alternative method of defining 
grammatical classes. Three criteria are used as a definitional basis » 
The first criterion consists of a paradigm for each of the four classes: 
noun, verb, adjective and pronoun. Each paradigm consists of possible 
phonemic inflectional endings which signal membership in the appropriate 
class. The second criterion lists words whose membership in one of the 
four classes is signaled by a change in word form rather than by the 
addition of inflectional endings (e.^. , mouse and mice). ^4 third 
criterion is the syntax of the sentence in which a word occurs. This 
criterion, Gleason feels, is a less sure basis for word class identification 
than the first two. Gleason'^ approach to grammatical c lass assignment 
suffers because the definition of the clabSc. is not sufficiently precise 
to serve as a basis for the de/elopment of algorithms and because it 
depends upon a subjective determination of a sentence. 

^, Fries, in contrast with Gleason, has defined the parts of speech in 
^erms of a basic structural frame or pattern (28). Each position within 
a structural frame is occupied by a particular word class. Any word 
which can fit into a given position in the frame' belongs to the 
corresponding word class* Eor illustration, consider the sample Irame 
sentences : 

1. The concert was good (always). 

2. The clerk remembered the tax (suddenly). 

24. Itiese first two criteria have been developed into a computer-- 
based stemming system (29). 
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3* The team went there* 
A word which cap replace either "concert," "clerk," "tax" or "team" 
belongs to Class 1^. Words which can replace "was," "remembered" or 
••went" are members of Class 2. Any word which can replace "good" 
placed in Class 3^, and words which can replace "always ," "suddenly'* or 
"there"' are placed in Class 4. These four classes roughly correspond to 
the traditional classes noun, verb, ad jective and adverb. Fries chose 
not to use^ these terms because of the confusion that might result from 
definitional differences. 

In addition to the four classes defined above. Fries defined a fifth 
class of words (Class 5) whose elements serve as structural markers 
within a sentence. These are frequently referred to as function words . 
They correspond in part to SECONDARY RELATIONS (^Chapter II). The members 
o£ this class are most readily defined ostensively since there appear to 
be so few of them (Fries identified just 154). The class is divided into 
15 groups which are distinguished by using the same structural criteria 
as are used for the first four classes. Some of the larger groups within 
this class ar-e the auxiliary verbs, conjunctions, prepositions, relative 
pronouns and determiners. 

There are significant differences between the first four classes and 
Class 5. For the first four clas ses^ -^exicdl" meanings are easily separable 
from structural meanings. For function words (Class 5) no clear distinct- 
ion can be made, perhaps biscause such words may have no lexical meanings. 
For instance, while one can detect differences in meaning among the 
prepositions a^, bj^, for and from , clear distinctions cannot be made 
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without reference to context* 

Another important difference between the first four classes and 
Class 5 is that whereas the first four classes are open ended, Class 5 
appears to be a closed class. As mentioned above, Fries identifies only 
154 members of class 5, while there are many thousands of words in each 
of the other four classes. Furthermore, in an analysis of 1000 words of 
text, Fries found that function words accounted for almost a third of 
the total number of word occurrences. And in a more recent analysis of 
more than 1,000,000 words of text, Kucera and Francis have found that 
these words account for-tiearly 46% of the total number of words in the 
text studied (30) • 

Finally, it is easy to show that in order to understand certain 
structural signals within English text. Class 5 words must be known as 
items. For instance, the two sentences 

The boys and the leaders were invited. - 
The boys of the leaders were invited, 
may be analyzed to yield 

Class-5 Class-1 Class-5 Class-S Class-1 Class-5 Class-2 
Class-5 Class-1 Class-5 Glass-5 Class-1 Class-5 Class-2 
The sentences are therefore indistinguishable on this basis. In fact, 
-the -dnly-'way in which a structural distinction may be made- between the 
two sentences is to know the words and and £f as items. In other words, 
the relationship between boys and leaders is established by the specific 
items and and of. 
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Another good example of the structural information provided by 
Classes words is found in the poem of the Jabberwocky: 

TWas brilligy and the slithy toves 
Did gyre and gimble in the wabe; 

All mimsy were the borogoves. 

And the mome raths outgrabe • • • . 
Alice says that "Somehow > it seems to fill my head with ideas - only 1 
don't exactly know what they are! "(31) The ideas which Alice derives 
from the poem are generated by the structural patterns and the under- 
scored function words (32). 

Function words also serve to eliminate ambiguity. Consider the 
sentence: 

Ship sails today (33). 
The ambiguity in this sentence can be avoided by adding function words 
or other structural markers. The above sentence could have any of the 
following meanings: 

The ship sails today. 
Ship the sails today. 
Shipped sail today. 
Ship sailed today. 

The use of function words to resolve ambiguity has also been demonstrated 
by the work Klein and Simmons in sentence generation (34). After 
incorporating function words in th^ir program, lexical ambiguity was 
reduced by 90%. Beckman has also demonstrated that the use of function 
words in English serve the purpose of an error detecting code (35) 
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3.2. Formal Definition of Function Words 

The work o£-Fr4es and others has demonstrated that a knowledge of 
function words and of the ways in which they operate within a sentence 
provides a concrete basis Zov the determination of grammatical classes. 
But to accomplish the development of an algorithm for grammatical class 
assignment, a rigorous definition of function words is required. 

The definition provided by Fries (36), although structural in 
character, implies through use of the phrase "can replace" that human 
judgement is involved in determining class membership. Such a definition 
is unsuited for algorithm development. On the other hand, the definition 
can be made suitable by only slight modification: if the phrase "can 
replace" is replaced by "replaces" then we have a rigorous definition 
not only of the function words but of all the grammatical classes. There 
is, however, another disadvantage of Fries' definition that is not so 
easily overcome. By I'se of sample frames, grammatical class assignment 
becomes a process of pattern comparison. It has been shown (37, 38) 
that simple comparison techniques are unfruitful. The development of 
viable techniques therefore seems to depend upon a prior knowledge of 
sentence structure. Hius the u**^ of sample frames pleads to a circularity 
in processing at least when the processing is done algorithmically . As 
a consequence, I have endeavored to obtain definitions of the grammatical 
classes which do not depend upon reference to any data save that which 
is available directly from a sentence. 
3.2.1. Definition of Function Words 

The class of FUNCTION WORDs is^ a set of words consisting of both 
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NAMES and REIATIONs* FUNCTION WORDs defined as comprising all 
SECONDARY RELATIONS; the AUXILIARYs, MODALs and ADJUNCTS; certain MAIN 
REIATIONs; and certain NAMES. Each of the constituent subsets is defined 
ostensively, as follows. 

SECONDARY BETATIONs were defined in Chapter II as consisting of 
7 subclasses. Certain cf these subclasses are now further subdivided. 
The complc^te hierarchy of SECONDARY REIATIONs is shown in Figure 3.1. 
The elements of each terminal class are listed in Table 3.1. 

Among the PRIMARY RELATIONS, the class of ALLIED PRIMARY RELATIONS, 
consisting of the AUXILIARYs, MODALs and ADJUNCTS, is included in the 
FUNCTION UCRD class. In addition, certain MAIN PRIMARY RELATIONS are 
treated as FUNCTION WORD elements. Table 3.2 contains the elements of 
PRIMARY RELATION which are members of FUNCTKM WORD. 

Ir order to specify what elements of the class NAME are members of 
FUNCTION WORD, it is necessary to subdivide NAME as shown in Figure 3.2. 
The elements of the terminal classes^ which are listed in Table 3.3, are 
important structural markers in the identification of the principal 
classes NAME and REIATION. 

fable 3.4 lists all the terminal classes which are included in 
FUNCTION WORD. The class of FUNCTION WORDs is thus completely defined 
and its relation to NAMEs and RELATIONS is demonstrated. Of course the 
subclasses of which FUNCTION WORD is comprised are not mutually exclusive 
so that it is necessary to distinguish between them on structural grounds 
A set of rules necessary for the complete and unambiguous determination 
of all word classifications Is presented in the next section. 
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SECONmY 
REIATION 




SCM CCN CCP THT THR PNT EOS PRP NEG NEV PRP 



Figure 3.1 Complete hierarchical division of SECONDARY 

REIATIONs. Definition of the terminal classes 
is given ^n Table 3.1. 
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Table 3.1 Specification of the Elements of each Terminal Class in 
SECONDARY RELATION. 


Class 


Elements of the Class 


CCN 


and but nor not or ^ 


CCP 


(same asjCCN. Distinction made on structural 
basis. See Rule 71, page 90 ). 


EOS 


? ! 


NEC 

* 


not 


' NEV 


never 


PNT 


• ;:•"() 


PRP 


about above according rcross after along amon^ 
around at before behind below betWQiE«ujAcyw>A»tii^ i 
hy despite down during except for from iti 
inside instead into of off on out outside over 
thr ough throughout - 1 o toward under until up 
lipon v*ith within without 


SCN 


although however if since than then therefore 
*-hough thus unless whether yet 


THR 


there 


THT 


that 


ADJECTIVAL 
ADVERBUL ^ 
.^.NOMINAL 


These classes, of CONJUNCTIVE SECONDARY RELATION are 
structurally defined (see Chapter III, Section 
3,4^2.2) 
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Table 3.2 Specifi tion of the Elements of each Class of PRIMARY 
REIATION which are Elements of FUNCTION WORD.^^ 



Class 



AJN 
AUX 

MOD 

VRB 



Elements of the Class 



did do does get got let 



am are be been being had has have having 
is was were 



can cannot could may might must shall 
should will would 



added appear appeared applied ask asked based 
became becoice becomes began believe born bring 
brought called carried closed come comes 
concerned consider considered continue continued 
covered decided described designed determine 
determined developed done dropped established 
expect find followed found gave give given 
gives go going gone happened hear heard held 
include*' increased indicated interested involved 
keep k t knew know known learned led limited 
lived . ok looked made make makes married 
meant meet met mpved needed obtained opened 
paid passed placed played prepared provide 
provided put raised^ ran reach reached read 
received related remained remember reported 
required returned rise said sat saw say says 
see seem seemed seems seen sent serve set 
sewed showed shown speak spent started stress 
suggested take taken tell think told took 
tried turned understand walke*^ want wanted 
went worked write written wrote 



25. Contractions which contain function words (e.g., I'm) are PRIMARY 
RELATIONS, but such contractions are not listed here for the sake 
of simplicity. 



I 
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Table 3.3 Specification of the Elements of NAME which are Elements 
of FUNCTION WORD. 



Terminal 
Class 


Elements of the Class 


ADV 


actually again ago ahead almost alone already 
always away apparently certainly clearly completely 
daily directly early easily especially even 
exactly farther finallv forwarH fl1^^hor' oAnav^iiir 
hardly here immediately just later less likely 
merely near nearly obviously often once only 
particularly perhaps probably ready really 
recently simply slowly sometime somewhat soon 
still suddenly today together too usually 


AMB 


all another both each eight either enough 
few five f irs t four her hundred many mi 1 1 ion 
more mos t much neither one ones other own 
same second several six some ten these this 


DTR 


a an every his its my our the third your 


EXP 


oh well 


INT 


ra ther quite very 


PRN 

1 


anyone he h im I it me aone o thers she them 
they thing things us you 


PRN 

2 


anything everything nothing something 


PRN„ 
3 


herself himsfelf itself myself themselves yourself 




wha t wha t ever who wh om 


REL 
2 


which whose 


REL^ 
3 


how when where while why 



Table 3.4 Composition of the Class of FUNCTION UORDs. 



HkJOR CIASS 


TERMINAL CLASSES 


NAME 




AOJ 


ADV AMB UTR "EXP INT 


NON 






PRNj^ 


PRN2 PRNj RELj^ REL2 


REL3 


SECONDARY 
RELATION 




CCN 
PRP 


CCP EOS NEC NEV PNT 
SCN THR THR 




^ — . Jr 










PRIMARY 
RELATION 




AOJ 


AUX MOD VRB 
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3.2.2. Rules for Determination of Grammatical Classes 

The FUNCTION WORDs together with the set of rules ;^iven below 
constitute necessary and sufficient conditions for the unambiguous 
determination of all grammatical classes present in an English sentence. 
The principal classes to be determined are the three subclasses of NAME: 
NON, AD J and ADV; and the three subclasses of PRIMARY RELATION; VRB, 
INF (INFINITIVAL) and PTC (PARTICIPiyiL) ♦ The rules to be described 
constitute a set of context sensitive productions which are most easily 
written by means of a simple notation. In addition to the names of the 
terminal classes given in Table 3.4, the sytub^.ls listed in Table 3.5 
will be used* A simple example will serve to ..lustrate the use of the 
notation. ' Consider the rule; 

. . . THR XXX ... • ... THR VRB . . . 
and its interpretation: in a sentence, -"^f the FUNCTION WORD subclass 
THR is immediately followed by an unclassified word, then that word is 
ascigned to the class VRB. 
3.2.2.1. Rules for the Class DTR 

Words in this class invariably init^nte NOMINAL PHRASEs. However, 
the termination of a NOMINAL PHRASE is not so easily recognized. This 
problem is illustrated by the following example. Given the "sentence" 

DTR XXX XXX PRP DTR DTR XXX EOS 
one might be tempted to write a rule irtiich would yield 

DTR ADJ NON PRP DTR DTR NON EOS 
The definition of SESTQICE given in Chapter II (p. 36 ) is, however, not 
satisfied, so that some sort of revision of the initial assignments is 



Table 3. » Definition of Notation Symbols used in the Rules for 
Grasnatical Class Assignment. 



Symbol 


Significance 


m 


any element of a sentence that has not been 
classified 


zzz 


any element of a sentence that has already 
been classit^«^d a generic class) 




logical not 


j 

1 


logical or 


i» 

• • • 


yields 

elements of unspecified type may be present 


•word' 


any inflected form of the word enclosed in 
the quote marks 


•word' 


precisely the word enclosed in the quote 
marks 


XXX* Ing^ 


an unclassified word ending in 'ing' 


xxx" 


element repeated n times 


( ) 


used to enclose a series of alternatives 



necessary. In this instance it is possible to alter the assignments so 
that 

DTR NON VRB FRP DTR DTR MON EOS 
is obtained. I have taken precisely this approach in the present research. 
Rules involving the class DTR are designed to identify the start of a 
N(RfINAL PHRASE, but they ignore the termination problem. Inaccuracies 
which this approach brings about are corrected in later stages of the 
analysis. The rules involving the class DTR are: 
Rule 1 ; 

. . . DTR XXX ... ^ • . . DTR NON . . . 
Rule 2 ; 

. . . DTR XXX"* ... ... DTR ADj"^*^ NON ... (n > 1) 

Rule 3 ; 

... DIE (INtIaDV) XXx"^ ... n-1 

... DTR (INtIaDV) ADJ NON (n > 1) 

Rule 4 ; 

. . . DTR XXX XXX*ing*ZZZ ... » 

. . . DTR NON PTC ZZZ . . . 

Rule 5 ; 

^n 

... DTR XXX XXX*ing*ZZZ ... ^-1 

... DTR ADJ NON PTC ZZZ ... (n > 1) 

Rule 6 ; 

. . . DTR VRB ... 

. . . DTR ADJ ... 
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Rule 7 : 

• DTR ZZZ s» 

• • • PRN 2«ZZ • • • 

3.2.2.2. Rules for the Class AMB 

The class AMB consists of words which may belong either to the 
class DTR or to the class PRN depending upon context • Itiree simple 
rules serve to distinguish between the two classes • 

Rule^SV ^^ 

• AMB ZZZ PRN ZZZ 
Rule 9 ; 

• » » AMB XXX » • • a» • » » DTR XXX • • • - ^ 

Rule 10 ; 

... (AMB|PRN) *own* ... ^ ... FRNVRB ... 
Note that the last rule also distinguishes 'own' as a member of VRB* 
3»2»2»3» Rules for the Class PRN 

The two subclasses of PRN, PRN^ and PRN^, are helpful in identifying 
the class VRB* Words i: *:he class PRN^ frequently are followed by the 
class ADJ* The following rules are based upon these classes • 
Rule 11 ; 

... PRN XXX ^ 

1 PRN VRB 

1 



ERIC 



FRN, XXX (AUX|VRB) ... => 

... PRN^ ADJ (AUX|VRB) 



Rule 14: 



. . . FRN2 zzz 



. . . FRN^ VRB ZZZ 



Rule 15: 



. . . FRN. XXX XXX 
2 



. . . FRN2 ADJ VRB 



Rule 16: 



. . . FRN^ XXX XXX ... » 



. . . FRN^ AOJ VRB . . . 



Rule 17: 



. . . PRN, XXX . . . 

"* ... PRN^ VRB . . . 

3.2.2.4. Rules for the Class INT 

The class INT gives rise to a single rule. 
Rule 18: 
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1 

I 

Rule 12 : ^ 

... XXX (PRN IPRN ) . . . s, ~| 
^ ... VRB (PRN1IPRN2) I 

Rule 13: 



] 



] 
J 

3 
] 
] 



... INT m ... «» ... INT ADJ ... 

3.2.2.5. Rules for the Class REL 

The class REL is divided into three subclasses, REL • REL and REL . i 

12 3 J 

The rules involving these classes rre as follows. 
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Rule 19: 



. 'which' (AMB|PRN|PRP) ... » 

... REL (AMB|FRN|PRF) 
2 



Rule 20: 



• • • REIf. XXX • • • ^ 

. REL 

I 



^ ... REL VRB ... 



Rule 21: 



n 

... RELm XXX . . . ^ ii**l 

... REL AD J SON . . . n > I 

2 

If EOS = ? then. 
Rule 22: 



• • • RBI« XXX • • • ^ 

^ ... REL VRB ... 

3 



otherwise 9 
Rule 23 ; 

... REL XXX*ing* ... 

^ ... REL^ PTC 

Rule 24 ; 

. . • REL^ XXX ... ^ 
Rule 25: 



. . . HON • • • 



. • . REL XXX XXX ... =* 

2 ... REL^ NON VRB 
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Rule 26 ; 

- •-• • REL^ XXX • • • n— 2 

. . • REL^ ADJ NON VRB . . . n > 2 

In general 9 
Rule 27 ; 

... REL EOS » ... ADV EOS 
3^2. 2. 6^ The Class MN 

Words in this class cannot be reliably identified by any rules so 
far considered. For this reason, a list of words in the class ADV has 
been incorporated in the class FUNCTION WORD so that the rules which 
have been developed may prove to function more reliably. The words in 
this group are given in Table 3.3. 
3.2.2.7. Rules for the Classes AJN . AUX and MOD 

The elements of these classes are relatively few in number and the 
classes are reliable predictors of juxtaposed classes. Therefore a 
relatively large nuniber of rules has been devised for these classes. 
The form of the rules differs in some instances depending upon the value 
of EOS. Rules involving the class AUX follow. 
Rule 28 ; 

... *BE' (A0V|nE6|nEV) 'being' m 

... 'BE* (ADVjNEGjNEV) PTC . . . 

Rule 29; 



... 'BE* 'being* ... ^ 

....V . . . ' BE PTC ... 



"ule 30 ; 

... 'being' XXX .. . 

... (ptcIaux) ADJ ... 

Rule 31 ; 

. . . 'being' XXx" ...» n-1 

...(PTC|AUX) ADJ NON 

Rule 32; 



... 'BE' (XXX'ing' |XXX'ed'). .. « 

. . . AUX VRB . . . 



Rule 33: 



... 'BE' ADV (XXX'ing' IxXX'ed').. . 

... AUX ADV VRB 



Rule 34: 



» » » * BE * XXX » • • ^ 

...'BE' ADJ .. 



Rule 35; 



. . 'BE' (INtIaDV) XXX ... - 

... AUX (INT I ADV) ADJ 



Rule 36: 



... 'BE' (INT|adV) XXx" ... ■» n- 

... 'BE' (INT I ADV) ADJ 



Rule. 37: 



... BE XXX ... ■* n'l 

... 'BE' ADJ NON ... 
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Rule 38 ; 

... 'BE' 'having' ... =» 

... AUX VRB . . . 



Rule 39 ; 

... 'BE' (ADV|NEG|NEV) 'having' ... =» 

... AUX (ADV|nEG|nEV) VRB 

Rule 40 ; 

' ... 'having' XXX'ed' ..." » 

... PTC PTC 

i Rule 41 ; 

^^^^^^^^^ 

'having' 'been' XXX'ed' ... » 

... PTC PTC PTC . . . 

Rule 42 ; 

... 'having' ... 

... PTC ... 

Rule 43 ; 

^ — ' 'HAVE' XXX'ed'... =» 

... 'HAVE' VRB 

Rule 44 ; 

... 'HAVE' XXX ... « 

... 'HAVE' NON ... 

> * * 

; Rule 45 ; 



I ... 'HAVE' XXX" . . . =. n-1 n > 1 

f . ... 'HAVE' ADJ NON* ... 

i 

I If EOS = ?, then 
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Rule 46: 



... AUX m (AUX I VRB) . . . •» 

... AUX NON (AUX|VRB) 

Rule 47; 



...AUXXXX" (AUX|VRB) ...» n-1 ' , 

... AUX ADJ NON (AUX |VRB) . . . 

Rule 48; 



... AUX XXX .. . • 

... AUX NON 

Rule 49; 



... AUX XXX XXX .. . • 

... AUX NON VRB . . . 

Rule 50; 



n 

V ... AUX XXX ... m j,.]^ 

... AUX ADJ NON VRB n > 2 

The following rules involve the class MOD. 

Rule 51; 



... MOD XXX .. . • 

. . . MOD VRB . . . 

Rule 52; 



. . . MOD ADV XXX . . . • 

... MOD ADV VRB . . . 

Rule 53; 



... (AUX|VRB) ... 'can' (DTR| PRP| PRN) ..." 

. . . (AUXj VRB) ... VRB (DTr] PRP| PRN) 



Rule 54: 



... (PRPIAUxIdtR) ('can' I 'may' I 'will') ... • 

. . . (prpIauxIdtr) NON 

Rule 55; 



... ('can' 1 'may' I 'will') (PRp|dtR) ... • 

... NON (PRpIdTR) ... 

The following rules involve the class AJN. 

Rule 56: 



... ('get' I'gets') (XXX'ing' IxXX'ed')... • 

... AUX VRB . . . 

Rule 57; 



... ('keep' I'keeps' I' kept') XXX'ing' ... - 

... AUX VRB 

Rule 58 ; 

... ('let' I'lets') XXX ... • 

... AUX VRB ... 

Rule 59; 



... 'let' XXX PRP ... • 

... AUX VRB PRP 

Rule 60; 



... 'let' XXX XXX ... « 

... AUX NON VRB . . . 

Rule 61; 



... 'let' XXX" ... 

... 'let' ADJ NON VRB 
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Rule 62: 



... 'let' DTR m m ... • 

... 'let' DTR NON VRB ... 

Rule 63; 



... 'let' DTR m° .... 

... 'let' DTR ADJ NON VRB... n>2 

Rule 64; 



... 'let' PRN m ... - 

... 'let' PRN VRB 

Rule 65 :26 ^ 

... ('dld'l 'does'l 'do') NEC m ... • 

... ('did' I 'doei'l 'do') NEC VRB ... 

Rule 66; 



... ('dld'l 'does') XXX ... • 

•>/ • • • AUX' VRB • • • 

Rule 67; 



... ('did'l 'does') XXX m ... • 

... AUX VRB NON . . . 

Rule 68: 



... ('did' I'does') XXx" ... • n-2 

... ('did' I'does') VRB ADJ NON . . . n >2 



26. The forms 'did' NEG; 'does' NEG and 'do' NEG are equivalent to 
'didn't', 'doesn't' and 'don't', respectively. 
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Rule 69 ; 

E0S"*1)0' XXX ... - 

EOS 'DO' VRB ... 

3.2.2.8. The Class VRB 

The PRIMARY REIATIONs are the most difficult to identify reliably.' 
For this reason a number of the most commonly occurl-ng* members of 
this major class are included in the FUNCTION NORDs as the class VRB. 
The VRB class gives rirt to no rules, but assists in the correct 
operation of other rules. The elements of~VRB are listed in Table 3.2. 

3.2.2.9. Rules for the Classes CCN. CCP and SCN 

The elements of CCD and CCP are identical. Initially the elements 
which comprise these classes are identified as elements f CCN. The 
rules presented in this section identify the instances in which a word 
is an element of CCP. In later processing, the elements of CCN are 
used to identify clause boundaries. Some elements of SCN are also 
elements of VRB and FRP. The rules presented here identify these 
occurrences and reclassify the SCN element accordingly. The elements 
of CCN, CCP and SCN are listed in Table 3.1. 
Rule 70 ; 

... m CCN (ME|HE|THEMt3HE|UE) ... m 

... NON CfN PRN 

Rule 71 ; 

... AOJ CCN XXX (EOS |FNT |CCN |SCN |THT) . . . • 

. . . AOJ CCP ADJ ZZZ ... 
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i 

Rule 72? 



... AD J CCN (ADvliNT) XXX (EOS|PNTjCCN|SCN|THT) ... • 

!.. ADJ CCP (ADV|INT) ADJ ... 

Rule 73 : 

... (AUXtVRB) CCN m (D1R|PRM|PRP|E0S|niT|CCNjSCN|TKT) ... -* 

. . . (A<a |VRB) CCP VRB ZZZ 

Rule 74 ; 

... HON CCM m (BOS|FNT|CCM|SCN|THT) >» 

... NOM CCP DON ZZZ ... 

Rul« 75 ; 

... SCN XXX' ing' ... » 

... SCN PTC ... 

RuU 76; 



... SCN EOS m 



.. AOV EOS 



Rule 77; 



('like'l'sincc'/ ... (AUXlVRB) ... (EOSl PNTl RELl CCNj SCN| THT) 

... PRP ... (AUX|VRB) ... ZZZ 



Rule 78: 



... ('llke'l'to') INF ... • 

...VRB PRP INF ... 

3.2.2.10. Rulct for the Cl«»» THR 

The class THR while sonetiaes having the role of an AOV, generally 

initiates a clause and precedes the class VRB. If THR follows VRB or 

if it precedes a GONJUCTIVE SECONDARY RELATION, it is classified as ADV. 
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Rule 79 : 

• . . THR XXX . • . 

. . . THR VRB , . , 

Rule 80 ; 

• (VRB|AUX) THR 

, , , (AUXj VRB) ADV , . , 

Rule 81 ; 

. • . THR PRP . . . a» ' » 

. . . ADV PRP . . , 

3.2.2. 11. Rules for the Class THT 

The class contains the single element ••that.'* The property which 
makes this element unique is that the element may belong to either the 
AMB class, the SCN class or the CONJUNCTIVE SECONDARY RELATIONS. Rules 
which apply to THT are similar to the AMB rules; however, THT is used 
in the later stages of MYRA (see Section 3.2.3.) to identify clause 
boundaries . 
Rule 82 ; 

. . . THT ZZZ . . . . . . PRN ZZZ . . , 

Rule 83 ; 

. . . THT X3CX . . . ^ . . . DTR XXX . . . 

3.2.2.12. Rules for the Class PRP 

Words in this class are used to identify NOMINAL PHRASEs, and 
elements of INF, and^PTC. The elements of PRP are listed in Table 3.1. 



Rule 84; 



• • . 'to' XXX • • • a* 

... PRP INF ... 

Rule 85; 



... 'to' ADV XXX ... • 

.... PRP ADV INF .. . 

r 

Rule 86; 



^ J n 
« • • to XXX • • • -f,^^ n*X 

.... PRN ADJ NON n > 1 

Rule 87; 



... PRP XXX'ing' ... • 

.... PRP FTC . . . 

Rule 88: 



... PRP XXX XXX (AUXj VRB) ... » 

... PRP NON NON (AUX|VRB) . . . 

Rule 89: 



... PRP XXx" (AUX |VRB) . . . •» n-2 

... PRP ADJ NON NON (Al)X |VRB) . 

Rule 90: , , . , 



... PRP XXX ... • 

... PRP NON ... 

Rule 91: 



n 

. . . PRP XXX ...» n.j 

.... PRP ADJ . NON n > 1 
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Rule 92: 



n 

. . . PRP (INT ADV) XXX ... • „-! 

, ... PRP (INT ADV) ADJ NON n > 1 



Rule 93: 



... PRP (EOS PNT) ... 

... ADV (EOS PNT) ... 

Rule 94 : 

... XXX' ing' PRP ... =. 

... PTC PRP ... 

3.2.2.13. Rules for the Class NEV 

This class is important because of its reliability in signalling 

the presence of a PRIMARY RELATION. NEV contains the single element 

"never . " 

Rule 95: 



... NEV XXX .. . a 

... NEV VRB ... 



Rule 96: 



... NEV ADV XXX .. . a 

... NEV ADV VRB ... 



Rule 97; 



... NEV XXX' ing^ ... 

... NEV PTC ... 
3.2.2.14. Rules for the Class NEG 

NE6 contains the single element *'not.'' This element can be classed 
as either a CON or as a component of a PRIMARY RELATION. The following 
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rules differentiate these uses. 
Rule 98 ; 

. . . AUX NEG m . . . « 

r. . AUX NEG VRB . . . 

Rule 99 ; 

... MOD NEG.m-... • 

... MOD NEG VRB ... 

Rule 100 ; 

V ' " ' 

... NEG XXX 'ing' 

. . . NEG PTC . . . 

Rule 101 ; 

• • • NEG '« • • 

• • • CCN • • • 

3.2.3. Sufficiency of the Rules 

In Section 3.2.2. it was stated that a knowledge of FUNCTION WORDs, 
together with the rules just described constitute necessary and 
sufficient conditions for the unambiguous determination of all grammatical 
classes as defined in this research. I believe these are necessary 
conditions: all elements of FUNCTION WORD are words that can reasonably 
be expected to occur in an English sentence and which serve the relational 
purpose defined for FUNCTION WORD; and the rules have all been 
established on the basis of patterns actually observed in English text 
(39). The question is therefore one of sufficiency. 
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It is not possible to say with certainty that FUNCTION WORD Is 
complete, therefore It cannot be said to be sufficient on that basis. 
However, experimentation has shown (see Section 4) that for practical purposes 
It appears sufficient. Thus the question of sufficiency rests with 
the rules* Here It Is easy to show that they are Insufficient since a 
sentence that contains no .eleTnei>t of FUNCTION UORD cannot be associated 
with any rule so far given. Furthermore, there Is ample experimental 
evidence to show that even If this case were eliminated from consideration, 
there would yet be found structural patterns which the above rules do 
not account for. One must conclude therefore that the rules are 
Insufficient. Can they be made sufficient? The answer to this question 
Is a qualified "yes." The qualification Is that the rules can bt rade 
sufficient In the sense that every element of every sentence can be 
classified. The question of accuracy of classification will be dealt 
with later. 

TWO types of rules need to be added to those already presented in 
order to complete the sufficiency condition. The first type is concerned 
with the classification of sentence elements which have not yet been 
classified. The second type deals with reclassification of previously 
classified elements so as to satisfy the definition of SENTENCE given 
earlier (Section 3.3.30 • These rules follow: 
Rule 102 ; 

... XXX" VRB ... n = 1 
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Rule 103: 



. . . XXX" ...=»... NON VRB .. . n = 2 



Rule 104: 



. . . XXX ... » ... ADJ NGN VRB ... n >. 2 



Rule 105: 



n tn P 

... XXX DTR ADJ NGN ADJ NGN ... « n = 0 

n m-l P in> 0 

... XXX DTR AD J NGN VRB AD J NGN .. . p > 0 



Rule 106: 



... XXX NGN ADJ^ NGN ... =» „ p n = 0 

... XXX VRB ADJ NGN . . . p > 1 

Rule 107 ; 

"... XX^ ADj" NGN ADJ^ NGN ... « " n =' 6 

n m-l p in> 0 

... XXX ADJ NGN VRB ADJ NGN .. . p > 0 

If none of the above rules apply the following rules are applied in a 

left-to-rlght manner until a VRB assignment Is made or until the end 

of the sentence is encountered. 

Rule 108 ; 

n n-1 
... DTR ADJ NGN ... • ... DTR ADJ NGN VRB . . . n > 0 

Rule 109 ; 

n n-1 
... AD J NGN AD J NGN VRB . . . n > 0 

Rule 110; 

... NON ... « ... VRB ... 
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4. MYRA " A Program for Grammatical Class Assignment 

A program, called MYRA, has been developed which accepts English 
text as input and produces as output each word of text together with the 
name of the class to which it belongs. The class of FUNCTION WORDs and 
the rules described in Section 3 form the basis for the program* 

MYRA is described in this section. Experimental results involving 
MYRA are given in Section 5. 
4.1. General Description of MYRA 

MYRA has been written in PL/i, and compiled using the F- level 
compiler of I.B.M. operating on an IBM System 370-165 (PHENIX XV, 
HASP II). MYRA is capable of processing text at a rate of 13,500 words 
of text per minute of C.P.U. usage. Approximately 126,000 bytes of 
storage are required for the program and working storage. 

The heart of MYRA is a dictionary and a set of rules. Each of these 
is discussed in the following paragraphs. 
4.1.1. The Dictionary 

The class of FUNCTION WORDs defined in Section 3.2.1. is 
incorporated in a dictionary. The dictionary is ordered first by the 
length ^(number of characters) of its elements. If several elements 
have the same length, then each of these is ordered alphabetically. 

Such an ordering facilitates the dictionary look-up process (see 
Section 4.2.1., below). Associated with each element of the dictionary 
is a numerical code which identifies the class to which the element 
belongs and serves as a unique NAME for the element. These data are 
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helpful in updating the dictionary* A program has been written to 
handle dictionary updates. It produces a dictionary acceptable to MYRA* 

Two versions of the dictionary have been established for test 
purposes (see Section 5). One, the limited version, consists of all 
the terminal classes listed in Table 3*4, except ADV and VRB, and 
contains 217 elements. The second, called the extended dictionary, 
consists of all the terminal classes of Table 3 •4. and contains 431 
elements. The extended dictionary is included as Appendix A. -The 
distribution of elements by length is shown in Figure 3.3. 
4.1.2. The Rules 

HYRA consists basically of a series of PL/ 1 statements which 
correspond to the rules given in Section 3.2.2. The "execution" of these 
statements results in the classification of the words of a sentence. 
4.2. Operation of MYRA 

HYRA accepts English text as input in a continuous string without 
any prior formatting or marking. This fact has two implications. First, 
MYRA doesn't "know" that the input^ is English text, but the text will be 
processed as though it were. Second, MYRA must break the string up 
into individual words. A WORD is defined as any string of characters 
bounded by blanks, except that the elements of the classes FNT and EOS 
are isolated as individual words and labelled as members of the 
appropriate class. Hence, a string such as 

becomes. 
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Figure 3.3 A bar graph of the number of elements of a given 
length In the extended dictionary. 
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is separated into two WORDs: 

becomes 

and 

9 

Once WORDs have been isolated, a preliminary SENTENCE boundary must 
be established* Any element o£ EOS serves this purpose. MYRA is then 
ready to e££ect grammatical class assignments* 

MYRA operates on the input text, partitioned as outlined above , in 
three stages. 

4.2.1. Dictionary Look- Up 

In the £irst stage, the individual words, exclusive o£ those in PNT 
or EOS are looked up in the dictionary. The input word is matched only 
against those dictionary elements o£ the same length. l£ a match occurs , 
the code for the dictionary element is entered into a vector which 
corresponds with the sequence o£ words between elements o£ EOS« For 
instance, the input string 

The mouse ate the cheese, 
would have already been partitioned as 

The/mouse/ate/the/cheese/ ♦ 
and a corresponding vector 



m XXX m XXX m EOS 
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27 

set up. If the first word of the string is looked up in the dictionary, 
a match will occur and the corresponding code retrieved. When all 
WORDS have been looked up (terminating with EOS), the program begins 
stage two. For our example, the vector resulting at the end of stage 
one will be 

DTR XXX m DTR XXX EOS 
4.2.2. Application of Rules 1*101 

In the second stage, MSTRA applies Rules 1-101. Application is 
signalled by the presence in the vector of an element of FUNCTION WORD 
or of some other element (s) already classified, so that only rules which 
can reasonably be expected to produce classifications are applied. In 
our example, we have at first a rule for DTR which can be applied 
(Rule 2). Its application yields 

DTR ADJ NON DTR XXX EOS 

Moving to the right in the vector, we see that a second rule for DTR is 
called for (Rule 1). Its application yields 

DTR ADJ NON DTR NON EOS 

Note that in stage two MYRA operates on the vector corresponding to the 
input string and does not process the input except in the instance of 
those rules that require a particular inflectional ending (i.£. , *ing*). 

27. I shall use the codes previously defined for illustration, rather 
than the numerical codes employed in MYRA. 



103 

At the end of stage two, the vector may or may not be complete. It 
is completed and verified in stage three. 
4.2.3. Application of Rules 102-110 

in stage three MYRA first classifies any previously unclassified 
UORDs. This is done by application of Rules 102*104. Finally » the 
vector is checked to see if a PRIMARY RELATION it present. (If the 
class AUX is not part of a CQMFOSITE FRIMMtY REIATION^ AUX is reclassified 
as MAIN). If no PRIMARY REUTION it present, then Rules 105*110 are 
applied to reclassify elements of the vector so that a PRIMARY RELATION 
is included. The aualysis is thus completed and the results are output. 
In our example, at the end of stage two the vector contained no PRIMARY 
RELATION. Application of Rule 108 causes the sequence 

DTR ADJ DON . . . 

to be transformed to 

DTR NON VRB 

so that the final vector would be 

DTR NON VRB DTR NON EOS 

The application of MYRA to several English texts and the results obtained 
are described in the next section. 
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5. Experimental Testing of MYRA 

In order to determine the efficacy of MntA, three bodies of text 9 
derived from 

a) The Need for More Precise Definition of **Algorithm" by 
B. A* Trakhtenbrot 

b) The Old Man and the Sea by Ernest Hemlngvay 

c) The Clavichord and Itow to Play It by M. Halford 

and totaling about 6000 words were used. These texts were processed 
using both the limited and extended dictionaries (Section 4«1«1«)* 
The results were then analyzed manually In order to determine the accuracy 
of the classifications. Wherever an error was found, Its cause was 
Identified (as far as possible). The accuracy of the classification was 
based upon my own Intuitive knowledge of English. Hence, I set myself 
up as the standard against which the results of MYRA were evaluated. 
The results of this analysis are presented In Tables 3.6 and 3.7. Using 
the limited dictionary, MYRA produced results that were 91% accurate on 
average using the evaluation criterion mentioned above. With the extended 
dictionary, MYRA achieved an average accuracy of 94%. More deull 
concerning the evaluation of MYRA will be found In Appendix D. Senile 
output from MYRA is shown in Figure 3.4. Complete output for the second 
body of text (The Old Mao and the Sea) is Included as 
Appendix C« 
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Ihe old man had taught the boy to fish and the boy loved him. 
DTR AOJ NON AUX VRB DTR NON PRP VRB CNJ DTR NON VRB PRN EOS 



Figure 3«4 Sample output produced by KYRA from the text The 
Old Man and the Sea« 
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6. Sunp_ary 

In Chapter II I have presented a theoretical framework suitable 
for the syntactic analysis of English text. Based upon this framework, 
a program, MYRA, has been developed implemented and tested which assigns 
words in a sentence to their appropriate grammatical classes (i*e., 
identifies them as NAMEs or REIATIONs)« The test results have been 
analyzed and an accuracy of identification of between 91% and 94% has 
been found* Thus, MYRA has been shown to have a theoretical base, to 
produce accurate results and to operate at a high rate of speed (at 
13,500 words per minute) « 

The output of HYRA forms the input to procedures described in 
Chapter IV. Further conclusions which may be drawn from the results so 
far obtained in this research will be deferred until the last chapter 
of this dissertation. 
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CHAPTER IV • IDENTIFICATION OF COMPOSITE AND COMPLEX NAMES 

All the innumerable substances which occur on earth*** 
shoes, ships, sea ling *-wax, cabbages, kings, carpenters, 
walruses, oysters, everything we can think o£-**-can be 
analysed into their constituent atoms, either in this 
or in other ways* It might be thought that a quite 
incredible number of different kinds of atoms would 
emerge from the rich variety of substances we find on 
earth. Actually the number is quite small. The same 
atoms turn up again and again, and the great variety 
of substances we find on earth results, not from any 
great variety of atoms entering into their composition, 
but from the great variety of ways in which a few types 
of atoms can be combined*-*- 

Sir James Jeans, The Universe Around Us 

There are countless ways of writing English sentences. 
. . . But sentences in English have certain elements in 
common, and when you start to analyze these sentences, 
you will find that there ere a very few basic sentence 
patterns that all writers use. 

Ann Eljenholm Nichols, English Syntax 

1. Introduction 

The preceding chapter has dealt with procedures for identifying 
SIMPLE NAMES and RELATIONS, procedures which were based upon a knowledge 
of a special class of WORDs, called FUNCTION WORDs, and upon a set of 
rules involving structural patterns. In this chapter I build upon the 
previous results. Procedures are described whose purpose is the 
identification of COMPOSITE and COMPLEX NAMEs. These procedures are 
based upon the definition of COMPOSITE NAME and COMPLEX NAME given in 
Chapter II, and upon structural signals provided by elements of 
FUNCTION WORD and by the arrangement of SIMPLE NAMEs and SIMPLE RELATIONS 
within a sentence as identified by MYRA. Before describing these 
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procedures, brief mention must. be made of certain related research. In 
Chapter III, Section 2.1, several projects were reviewed In which either 
a form of top-down parser (1, 2, 3, 4) or of bottom-up parser (5) was 
used. The output of these parsers Included grammatical class assignment, 
phrase-type recognition and clause Identification. These parsers are 
similar to one another In that each employs a large dictionary which 
contains all possible grammatical' classes for each lexical entry. The 
size of the dictionary and the complexity of the procedures result In 
long processing times. One possible exception to this statment may be 
afforded by the work of Woods (6). Although extant publications by Woods 
indicate processing times comparable to those realized for other top- 
down parsers, unconfirmed reports indicate that Woods has substantially 
decreased his processing times (7). 

One of the projects reviewed in Chapter III not only Involved the 
identification of word classes, but also the identification of nine 
phrase types (noun, prepositional, pronoun, infinitive, verb, adverbial, 
post-modifying adjective, present participle and past participle) (8). 
Stolz, Tannenbaum and Carstensen reported an accuracy of 917. in the 
application of their procedures to both technical and nontechnical 
abstracts (8). Clauses were also identified, but the details of clause 
identification were not given. 

In general, previous work on phrase and clause identification has 
relied heavily on lexical information and it was therefore of interest 
to test the hypothesis that the desired results could be attained 
without recourse to extensive dictionaries and with rather minimal 
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rules. A description of the results of such tests forms the remainder 
of this chapter. 

2. Procedures for the Identification of CIAUSEs 

In this section is described a prototype system for the identification 
of the class of COMPLEX NAMEs called CIAUSEs as they occur in English 
text. This system is one component of the language analysis system 
described in this dissertation. An improved version of this prototype 
is outlined in Section 4. 

The set of procedures which identify CIAUSEs is called CAP/l 
(CLAUSE Analysis Procedures/I). Input to CAP/l is the output of MYRA 
(Chapter III). The delimitation of CIAUSEs by CAP/l is based primarily 
upon structural signals. These signals include the classes of 
CONJUNCTIVE SECONDARY RELATION, namely SCN, CCN, CCP, THT and THR, as 
well as the classes PKT, EOS, INF and PTC (see Table 3.1 for definition). 

CAP/l operates in two phases* CIAUSEs are Identified in either 
phase according to the structure of the SENTENCE which contains them 
and according to the type of CONJUNCTIVE SECONDARY REIATION (if present) 
that introduces the CIAUSE* The rules employed in each phase of CAP/ 1 
are described below. 
2.1. Phase I of CAP/l 

In phase I, CAP/l examines successive WORDs of a SENTENCE until 
one of the elements of CCN, THT, PNT or REL is encountered (see 
Tables 3.1 and 3.3 for definition of these classes). When one of 

these RELATIONS is encountered, rules for the particular RELATION are 
applied to the SENTENCE. The purpose of phase I is to make initial 
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CIAUSE boundary assignments. Other CIAUSE boundaries are added in 
phase II. 

2. 1. 1. Rules for the Class CCN 

When an element of CCN is encountered t several checks are made to 
ascertain that the CCN marks a CIAUSE boundary. No CIAUSE boundary 
is marked if any of the following conditions is met. 

1. The CCN is the first or second element of the sentence. 

2. The WORD (element of PNT) precedes the CCN. 

3. The WORDS ^'either*' or "neither" occur proceeding the CCN 
elements "or" or "nor>" respectively. 

If none of these conditions is met» a CIAUSE boundary is marked. 

2.1. 2. Rules for the Class REL 

If REL is preceded by FRP» then the FRP is marked as the CIAUSE 
boundary. If REL is preceded by CCN» the CCN is marked as the CIAUSE 
boundary. Otherwise » the REL is marked as the CIAUSE boundary. 

2.1. 3. Rules for the Class THT 

In the assignment of WORDs to their respective classes by MYRA» 
THT is treated as a subset of the AMB class. CAP/I recognizes THT and 
makes several checks to determine whether THT is a CIAUSE marker. THT 
does not mark a CIAUSE if: 

1. THT is the first element of a SENTENCE; 

2. THT is preceded by FRP; 

3. The CIAUSE in which THT is contained is intiated by PRP; 

4. THT is followed by CCN; 

5. THT is preceded by PNT. 
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2,1.4. Rule for the Class PNT 

CAP/I identifies any PNT as a CLAUSE boundary unle'ss the PNT is 
the first or second element of the SENTENCE. 

Phase I of CAP/l terminates when 

1) each occurrence of CCN, REL, THT and/or PNT in a SENTENCE 
has been examined; 

2) the appropriate rules have been applied; and 

3) the preliminary CLAUSE boundaries have been marked. 
2.2. Phase Two of CAP/l 

This phase concerns the application of rules which involve elements 
of the classes SCN» INF and PTC. When one of these elements is 
encountered in a SENTENCE » a new CLAUSE is indicated. The rules of 
phase 2 are described below. 

2.2.1. Rule for the Class SCN 

When an element of SCN is encountered in a SENTENCE, a new CLAUSE 
boundary is marked. The CLAUSE terminates when a new CLAUSE boundary 
is encountered or when EOS is reached. 

2.2.2. Rule for the Class INF 

When an element of INF is found, the immediately proceeding PRP is 
marked as a new CLAUSE boundary. The CLAUSE rerminates as for SCN. 

2.2.3. Rule for the Class PTC 

A CLAUSE is marked whenever an element of PTC is found which is 
inmediately preceded by PRP or SCN. If neither PRP nor SCN immediately 
precedes PTC, the CLAUSE boundary is ^rked at PTC. 
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2.3. Experimental Results from CAP/I 

Ihe procedures ot CAP/l are applied to the NAMES and RELATIONS that 
have been identified by MYRA. As mentioned in Chapter III^ MYRA was 
tested using three English texts comprising some 6,000 words. Ihe 
unaltered output of MYRA was used to test CAP/I; hence CAP/I was 
tested on automatically classified NAMES and ElELATIONs rather than on 
NAMES and RELATIONS manually classified. This was done to provide a 
measure of how MYRA and CAP/l operate together as a system. Sample 
output produced by CAP/l is given in Figure 4.1. Complete output for 
the article The Old Man and the Sea i3 given in Appendix C. 

Like MYRAt GAP/l is progranned in PL/I for the IBM S/370-165 
computer system. 126^000 bytes of main storage are required for the 
programs and for working storage. CAP/l operates at the rate of 
15^000 words per minute. 

2.3.1. Identification and Analysis of CAP/l Errors 

The output of CAP/l was examined for errors using the same basic 
criteria as eo^loyed in the analysis of errors made by MYRA (Chapter 
lllf Section 3.3). Errors were classified according to whether a CLAUSE 
was incorrectly identified or incorrectly delimited. For example^ if, 
in the SENTENCE 

The girl sitting on the stair won first prize. 

the ADJECTIVAL CLAUSE "sitting on the stair'' were not recognized, then 
one error would be noted for that fact, and one error would be recorded 
for the fact that the CIAUSE "The girl won first prize" would be 
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303. The old Mn had taught the boy 



DTR ADJ NON 



AUX VltB 



DTR HON 



304. to fish 
PRP VRB 



305. and the boy loved his. 



m NGN 



VRB PRM 



EOS 



Figure 4.1 Saaple output produced by CAP/T from The Old Han 
and the Sea. 
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incorrectly delimited. 

The overall results of this error analysis for each of the texts 
tested are given in Table 4.I. As average accuracy of 62% was found 
for the delimitation of CIAUSESf and an average accuracy of 85% was 
determined for the identification of CIAUSEs. 
2«3.2. Conclusions From Analysis of CAP/I Results 

In analysing the results of CAP/I, the most significant problem was 
that of correctly recognizing a CLAUSE. Incorrect CLAUSE recognition 
also meant incorrect CIAUSE delimitation. Many of the errors committed 
by CAP/I could be corrected by incorporating rules which examined the 
patterns of NOMINAL, PRIMARY and SECONDARY FHRASEs. Thus it will be 
suggested in Section 4 how an improved CLAUSE identification program 
might be developed to take advantage of such data. The necessary data 
are provided by the program to be described next. 

3. Procedures for the Identification of PHRASEs (PAP) 

The procedures described in this section have been designed to 
identify and classify certain COMPOSITE and COMPI£X NAMEs called, in 
general terms, PHRASEs. Four types of PHRASE are identified by PAP: 

NOMINAL PHRASE 
PRIMARY PHRASE 
SECONDARY PHRASE 
AOV PHRASE 

The first three PHRASE types have been defined in Chapter II, Section 3.4.3. 
An ADV PHRASE is defined as any occurrence of the class ADV outside 
the boundaries of a N(MINAL, PRIMARY or SECONDARY PHRASE. 
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PAP operates in essentially two phases. In the first phase, 

certain of the WORD classes identified by MYRA that are linked by CCP 

are reduced to a single -iement of the appropriate class. thuS} a 

sequence such as NON CCP NON would be reduced to NON. In the second 

phase, PHRASES are identified, delimited and classified. 

3.1. Procedures for the Reduction of Certain 0 .iPLEX NAMEs to SIMPLE 
NAMES 

PAP first examines a SENTENCE for CCPs which conjoin two SIMPLE 
NAMES or two SIMPIE RELATIONS. These triples, which are COMPLEX NAMEs, 
are reduced by PAP to SIMPLE NAMEs or RELATIONS. These new NAMEs or 
RELATIONS retain the WORD class assignments made earlier by MYRA. The 
rules for effecting these reductions are as follows. 

... NON CCP NON • . . . NON . . . 

. . . ADJ CCP ADJ . . . =* . . . ADJ . . . 

. . . ADV CCP ADV . . . =* . t . ADV . . . 

. . . VRB CCP VRB . . . =* . . . VRB . . . 

. . . PRP CCP PRP . . . =* . . . PRP . . . 

. . . AUX CCP AUX . . . =* . • . AUX . . . 

Uhen the pattern PRN CCP HUN is found, the elements of FRN are examined 
f^r the following patterns. 

... (•I'I'she'l'he'l'we'i'they') CCP (W | 'her' | 'him' | 'us' | ' them') 
... ('me* I'her' I'him' pus' I'them') CCP ('l' | 'she' | 'he' | 'we' | ' they ' ) 

If one of these patterns is encountered, no reduction is effected. Other- 
wise, the following rule is applied. 



PRN CCP PRN . ... FRN 
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3.2. Procedures for the Identification of PHRASES 

In the second phase of PAP, PHRASES are identified, delimited and 
typed. The necessary procedures are based upon both the definition of 
each PHRASE type and upon the order in which NAMES and REIATIONs that 
may constitute the PHRASEs occur. The definitions of the four types of 
PHRASE are given in Table 4.2. 

In this phase of PAP, the class of each WORD in the SENTENCE is 
examined in a left- to-right manner. The class of the first WORD in 
the SENTENCE determines the type of PHRASE to be delimited at that point. 
Consider, for example, the following SENTENCE and corresponding classes 
for each WORD. 

The boy quickly ran down the trail. 
DTR NON ADV VRB PRP DTR NON EOS 
The class of the first WORD in the SENTENCE is DTR. This signals a NOMP 
(NOMINAL PHRASE) and calls for the application of pattern 1 and 2 of 
Table 4.2. Since the second element of SENTENCE is NON, either of the 
patterns is satisfied and the NOMP is isolated. 

The next element in our example SENTENCE is ADV, which initiates 
the application of patterns 17, 18 and 21 (Table 4.2). Since the next 
SENTENCE element is VRB, pattern 17 is eliminated, as is pattern 21. 
Hence pattern 18 applies and a PRMP (PRIMARY PHRASE) is isolated. 

The remainder of the SENTENCE is processed by PAP in a similar 
manner. The complete processing of the example SENTENCE by PAP would 
yield 
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Table 4.2 Rules for the Identification and Characterizatiun 
of PHRASES. 



RULE 


WORD CIASS 
PATTERN 
REQUIRED 


PHRASE 
TYPE 
INDICATED 


PHRASE 
LIMITS 


INITIAL 


FINAL 


1. 


dtr" ((adv"adj)adj'*)non 


NOMP 


/DTR 


NON/ 


2. 


dtr'*((int adj)adj'*)non 


NOMP 


/DTR 


NON/ 


3. 


adj" non 


NOMP 


/ADJ 


NON/ 


4. 


NON 


NOMP 


/NON 


NON/ 


5. 


INT Ad/nON 


NOMP 


/INT 


NON/ 


6. 


PRN 


NOMP 


/PRN 


PRN/ 


7. 


PRN.ADJ 


NOMP 


/PRN^ 
2 


ADJ/ 


8. 


REL NOMP 


NOMP 


/REL 


NOMP/ 


9. 


NEV AUx'*((INT)ADV)VRB 


PRMP 


/NEV 


VRB/ 


10. 


NEV AUX" 


PRMP 


/NEV 


AUX/ 


11. 


NEV PTc" 


PRMP 


/NEV 


PTC/ 


12. 


NEV 'to' AUX ((INT)ADV)VRB 


PRMP 


/NEV 


VRB/ 


13. 


NEV 'to' ((INT)ADV)VRB 


PRMP 


/NEV 


VRB/ 


14. 


AUx" NEG VRB 


PRMP 


/AUX 


VRB/ 


15. 


AUx" VRB NEG 


PRMP 


/AUX 


NEG/ 


16. 


AUX'*((INT)ADV)VRB 


PRMP 


/AUX 


ADV/ 


17. 


AUx'*VRB((INT)ADV 


PRMP 


/AUX 


ADV/ 


18. 


AOV AUX VRB 


PRMP 


/ADV 


VRB/ 
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Table A. 2 (continued) 



RULE 



19. 
20. 
21. 
22. 
23. 
24. 



ADV VRB 
VRB ADV 
PRP" NOMP 
ADV 

PRP (PNT) 
PRP (EOS) 



WORD CLASS 
PATTERN 
REQUIRED 



PHRASE 
TYPE 
INDICATED 



PHRASE 
LIMITS 



INITUL 



FINAL 



PRMP 
PRMP 
SECP 
ADVP 
ADVP 
ADVP 



/ADV 
/VRB 
/PRP 
/ADV 
/PRP 
/PRP 



VRB/ 
ADV/ 
NON/ 
ADV/ 
PRP/ 
PRP/ 
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The boy quickly ran down the trail. 

DTR NON ADV VRB PRP DTR NON EOS 
I NOMP I PRMP I SEC? I 

Processing is terminated when EOS is encountered. 

3.3. Procedures for the Reduction of Certain COMPLEX NAMEs to PHRASEs 

Once PHRASEs have been delimited, it may be found that two or more 
of them are conjoined by CCP. For example , 

DTR NON ADV VRB PRP DTR NON CCP PRP ADJ NON EOS 
I NOMP I PRMP I SECP I I SECP | 

Under these circumstances, it may be desirable to reduce such a COMPLEX 
NAME to a PHRASE of the same type as those in the COMPLEX NAME. Such a 
reduction is accomplished in the same manner as reduction of COMPLEX 
NAMES and RELATIONS to SIMPLE NAMEs and RELATIONS (Section 3.1). The 
following rules serve for the reduction of COMPLEX NAMEs to PHRASEs. 

. . . ADVP CCP ADVP ... . . . ADVP . . . 

. . . NOMP CCP NOMP ... . . . NOMP . . . 

. . . PRMP CCP PRMP ... . . . PRMP . . . 

. . . SECP CCP SECP ... . . . SECP . . . 

Thus, in the following example, two NOMINAL PHRASEs are linked by CCP 
are 

The suitcases and packages were left on the plane. 

DTR NON CCP NON AUX VRB PRP DTR NON EOS 
I NOMP I I NOMP I PRMP | SECP | 

reduced to a single NOMP, hence: 

DTR NON CCP NON AUX VRB PRP DTR NON EOS 

I NOMP I PRMP I SECP I 
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3.3,1, Reductions Not Involving CCP 

ADJECTIVAL SECONDARY RELATIONS introduce PHRASEs that may be 
combined with a preceding NOMINAL PHRASE to form a new NOMINAL PHRASE. 
To accomplish such a reduction, it is first necessary to distinguish 
between ADJECTIVAL and ADVERBIAL SECONDARY RELATIONS, The distinction 
between these two classes is based upon the structure in which the RELATION occurs • 
The following rules serve to distinguish ADJECTIVAL SECONDARY RELATIONS 
from ADVERBIAL SECONDARY RELATIONS which are therefore defined by default. 

1. The SECONDARY REIATION "of" introduces an ADJECTIVAL SECONDARY 
PHRASE if it initiates a SECONDARY PHRASE and if it is 
preceded by a NOMINAL PHRASE. 

2. Any SECONDARY RELATION introduces an ADJECTIVAL SECONDARY 
PHRASE if it initiates a SECONDARY PHRASE that is preceded 
by a NOMINAL PHRASE and followed by a PRIMARY PHRASE • 

An ADJECTIVAL SECONDARY RELATION is considered to 
be an attribute of the NOMINAL PHRASE which precedes it. Thus, the 
NOMINAL PHRASE and the SECONDARY PHRASE are reduced to a single NOMINAL 
PHRASE. As the following example illustrates, the SENTENCE 

The present for the children was filled with bags of candy, 

DTR NON PRP DTR NON AUX VRB PRP NON PRP NON EOS 
I NOMP I SECP I PRMP I SECP | SECP | 

becomes 29 



29. By using such procedures, the SENTENCE "The children's present was 

filled with candy bags," would be assigned the same PHRASE structure. 
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The present for the children was filled with bags of candy. 

DTR NON PRP DTR NON AUX VRB PRP NON PRP NON EOS 
I NOMP I PRMP I SECP I 

3A. Results of Tests 

The WORD and CLAUSE assignments produced by MYRA and CAP/I form 
the input to PAP, PAP was tested on the data generated from the three 
articles mentioned earlier* An average accuracy of 90% was attained in 
correctly delimiting PHRASEs and an accuracy of 927. was attained in 
correctly identifying PHRASEs. Hie results produced by PAP for each of 
the three documents analyzed are presented in Table 4.3. Sample 
output produced by PAP is illustrated in Figure 4.2. More extensive 
output may be found in Appendix C. 

Most of the error which occurred both in delimiting and identifying 
PHRASEs was caused by errors in MYRA and CAP. Only a small percentage 
of error was directly caused by the definition of the structure of 
PHRASE types. A detailed analysis of the results generated by PAP is 
presented in Appendix E, 

In this and the preceding chapter, three programs have been described 
MYRA, CAP/I and PAP. The organization of these programs into a system 
for language analysis is illustrated in Figure 4.3. As alluded to in 
the discussion of CAP/l, this organization is somewhat illogical in that 
knowledge of PHRASE boundaries would be of help in identifying and 
delimiting CLAUSES, whereas the converse does not hold. Therefore, this 
organization has been modified as depicted in Figure 4.4. PAP now 
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303. 


The old nan had taught the boy 






DTR ADJ NON 


AUX VRB 


DTR NON 






NOMP 


PBMP 


NOMP 



I 



304. to fish 
PR? VRB 
PBMP 



305. and the boy loved him. 



CNJ DTR NON 
NOMP 



VRB PRN 
PRMP NOMP 



EOS 



Figure 4.2 Sample output produced by PAP from The Old Man 
and the Sea. 
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COMPLEX 
MAMEs 
(CLAUSE)i 



PAP 



^COMPOSITE AND; 
COMPLEX 
NAMES 
V(PHRASE) 



Figure 4.3 The operation of the language analysis procedures. 
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\ INPUT 
ENGLISH 
TEXT 




\ "COMPOSITE 
AHD 
COMPLEX 
k NAMES 
\(PHRASE)i 




\COMPI£X 
NAMES 
(CLAUSE) 



Figure 4.4 The operation of the language analysis procedures 
with the refined version of clause recognition. 
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derives its input directly from MYRA and a new CLAUSE program, CAP/II , 
has been designed which accepts input both from MYRA and from PAP* 
CAP/lI is described in the next section* 

A. Improved Procedures for the Identification of CLAUSES 

CAP/lI incorporates not only those procedures embodied in CAP/l, 
but also contains a number of additional procedures that take advantage 
of the output of PAP. Phase I of CAP/lI corresponds with phase I of 
CAP/I. Phase II of CAP/lI is carried out in the same way as for CAP/l, 
but now no tests are made involving the classes INF and PTC* After 
insertion of the CLAUSE markers, the strings are examined to determine 
the accuracy of the CIAUSE-boundary assignment. If a PRIMARY RELATION 
is not found within CLAUSE boundaries, one or both boundaries are 
deleted. If a CLAUSE initiates a SENTENCE, its right boundary is 
deleted; otherwise a right boundary is deleted until a PRIMARY PHRASE 
is found or until EOS. If no PRIMARY PHRASE is found, left boundaries 
are deleted, from right to left until a PRIMARY PHRASE is found. In 
the next step, CAP/lI examines PHRASE patterns to determine whether 
CLAUSES are present which have not yet been identified. 
4.1. Rules for Identifying CLAUSES which Involve the Class INF 

CaP/II contains a number of rules for the identification of CLAUSES 
based upon patterns of PHRASEs. Five of these rules involve the 
Class INF (a class of PRIMARY RELATION). These rules are: 
1. ... NOMP PRMP* ... - .../NOMP PRMP* ... 
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*2. ... PRMP' SECP" PRMP ... - .../PRMP' SECP°/PRMP ... 

*3. ... PRMP' !WMP SECP" PRMP ... • .../PRMP* NOMP SECP"/PRMP ... 

*4. ... PRMP' NOMP SEC?" NOMP PRMP ... • 

. . . /PRMP' NOMP SECP" NOMP/PRMP . . . 

*5. ... PRMP' SECP" NOMP . . . PRMP ... • 

.../PRMP' SECP /NOMP ... PRMP ... 

(where PRMP' it a PRIMAKY PHRASE containing INF, n > 1, "/" marks a 

CLAUSE boundary and where rules narked with an asterisk are interpreted 

as placing a clause marker before PRMP' only if rule 1 cannot be 

applied) . 

If a PRMP' is innediately preceded by NOMP, the NOMP initiates the 
CLAUSE. Otherwise, PRMP' initiates the CLAUSE. The right boundary of 
each CLAUSE identified by rules 2>5 is narked as indicated in the rules. 
4.2. Rules for Identifying CLAUSES which Involve the Class PTC 

The rules in which PTC is involved are: 

6. ... PRMP* SECP" PRMP ... • ... /PRMP* SECp" /PRMP .. . 

7. ... PRMP* NOMP SECP" PRMP ... • 

.../PRMP* NOMP SECP" /PRMP ... 

8. ... PRMP* (NOMP SECP")^ PRMP ... - 2 

. . . /PRMP* (NOMP SECP ) /PRMP . . . 
n 2 

9. ... PRMP* (NOMP SECP ) NOMP ... PRMP ... 

.../PRMP* (NOMP SECP**) /NOMP ... PRMP ... 

(where PRMP* is a PRIMARY PHRASE containing PTC, n > 0 and "/" marks 
a CIAUSE boundary). PRMP* always initiates a CLAUSE. If PRMP* preceded 
by a NOMINAL PHRASE, the CLAUSE initiated by PRMP* is ADJECTIVAL (see 
Figure 2.7, Chapter II), utherwise the CLAUSE is NOMINAL. 
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4.3. Other CIAUSE Identification Rules 

After all of the CIAUSE identification procedure ^o far described 
have be^n applied by CAP/II, a final check is made of the SENTENCE for 
>^\y unidentified CIAUSEs accordir; to the following rules 

10. PRMP SECP** PRMP ••. • 

.•• PRMP SECP^/PRMP 

11. PRMP SECP^ NOMP PRMP ••. • „ 

. • • PRMP SECP"/N0MP PRMP • . • 

12* •.. PRMP NOHP SECP" PRMP • 

• • • PRMP/MOMP SECP PRMP . . • 

(where 0 and marks a boundary between CIAUSEs). These rules 

serve only to mark the boundary between CIAUSEs. The remaining CLAUSE 

boundaiies are determined by the other procedures in CAP/lI (for 

instance, EOS would mark the right boundary of a CIAUSE whose left 

boundary was marked by use of one of the rules 10*12). 

Although CAP/lI has not been implemented, the procedures which it 

embodies have been tested manually and have been shown to give much 

better results than CAP/l« Implementation and testing of CAP/II are 

currently being carried out. 

This chapter has described two programs which are currently in 
operation (CAP/l and PAP) and has described the design of a third 
program (GAP/lI) which is currently under construction. These programs 
provide a meanit for analyzing English text in terms of NAMEs and 
REUTIONs on several different levels (i.e., SIMPLE NAMEs and RELATIONS, 
CLAUSES and PHRASEs.) The output generated by these programs has been 
illustrated and the accuracy of the result has been detailed. A final 
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program developed in this research, and described in the next chapter. 



f builds upon the results of MYRA, CAP/l and PAP. 



I 
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CHAPTER V. THE FUNCTION OF NAMES AND REUTIONS. 

1. Introduction 

In previous chapters, procedures have been described which Identify 
NAMES and RELATIONS on several levels of complexity. These procedures 
are largely structurally based* We turn now to a consideration of 
procedures which deal with the function of NAMEs and RELATIONS within 
a SENTENCE* Function is here described in terms of the theory of case 
grammar formalized by Fillmore (1, 2)* 

A brief review of the notions of case (in the Fillmorian sense) is 
presented first, followed by a description of Fillmore's original theory. 
Then a modification of his theory developed for this research is presented 
and finally, procedures are described which categorize elements of a 
SENTENCE according to their function within this framework. 

2. Case Grammar 

The notions of case have been discussed for many years, but commonly 
in terms that modern linguists find to be of little practical utility. 
The traditional cases included nominative, genative, dative, and 
accusative. In contrast, case grammar is concerned with the role played 
by various elements of a sentence. Thus, typical cases are agent, 
object, experiencer, beneficiary, locative and so on. 
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Several intuitive definitions of this type of case role are to 
be found in the information storage and retrieval literature (3, 4, 5, 6). 
Case roles are also to be found in one guise or another in the work of 
Shank and Tesler (7), Winograd (8), Woods (9), Quillian (10) and others. 
(Some details of the work of these authors have been given in previous 
chapters . ) 

Case grammar was first presented by Charles Fillmore in 1966 (1). 
However, the classic statement of case grammar is found in "The Case 
for Case" presented by Fillmore in 1968 (2). Fillmore himself has 
modified his model of case grammar over the past few years (11, 12) and 
many workers involved in semantic analysis have adopted some form of 
case grammar. For a detailed description of Fillmore's positions and 
of work in which modifications of case grammar have been made for 
specific uses, the reader is referred to (12, 13). The model of case 
grammar given here differs from those presented by Fillmore in two major 
respects: first the PRIMARY RELATION is viewed as central and demands 
certain case roles; second, case roles must depend minimally upon 
extralinguistic evidence for their identification. The model of case 
grammar presented below builds upon Fillmore's work, and includes 
conceptualizations of Chafe and Cook (14, 15, 16, 17, 18). 
The Basic Components of Case Grammar 

Case grammar postulates that 1) the basic unit of text for analysis 
is the CLAUSE; 2) a CLAUSE consists of a series of NOMINAL and 
SECONDARY PHRASEs which are non-linearly related to a PRIMARY PHRASE. 
PRIMARY PHRASEs are partitioned into several categories, and NOMINAL 
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SECONDARY PHRASEs are partitioned into categories under the control of 
the category of PRIMARY PHRASE. A category of NOMINAL or SECONDARY 
PHRASE is called a case grammar role, or case role . The structure of 
a CLAUSE is represented in terms of the category of the PRIMARY PHRASE 
and in terms of the case roles assigned to the NOMINAL or SECONDARY 
PHRASEs. This representation is referred to as a case frame . 
2.2. Categories of PRIMARY RELATION 

Case grammar defines five categories of PRIMARY PHRASE: 



AGENTIVE 

BENEFACTIVE 

EXPERIENCER 

REFLEXIVE 

STATIVE 



The distinction between these categories is made on the basis of the 
identity of the MAIN PRIMARY REIATION in the PRIMARY PHRASE. For the 
BENEFACTIVE category, the MAIN PRIMARY RELATION elements are: 



The EXPERIENCER category is defined by the following elements of MAIN 
PRIMARY RELATION. 



had 
has 
have 



having 



died 

doubt 

fear 

feel 

hear 

hope 

know 

like 



want 
wish 



wonder 



understand . 



love 

remember 



see 
smell 



139 



The MAIN PRIMARY REIATIONs in the category STATIVE are: 

am being 

are is 

be was 

been were 

Thus, the categories BENEFACTIVE, EXPERIENCER and STATIVE are defined 

ostensively. The category RELFEXIVE is defined as being comprised of 

those PRIMARY PHRASEs (other than those in the categories BENEFACTIVE, 

EXPERIENCER or STATIVE) which are preceded in a CLAUSE by exactly one 

NOMINAL PHRASE and which are followed by no NOMINAL PHRASE. The 

category AGENTIVE is defined by default. 

2*3* Case Roles 

TWO classes of case role are identified: essential (also called 
nuclear, propositional or major) and peripheral (also called modal or 
minor). In an intuitive sense, essential cases are assigned to those 
elements of a CLAUSE which can be interpreted as answering the questions 
"who ,""what," "which"; peripheral cases are assigned to elements which 
answer questions like '"when," 'Tiow," "why," "where," "for what." In 
general, essential cases are those demanded by a PRIMARY PHRASE, while 
peripheral cases are usually, if not always, optional. 

The case roles and their definitions, according to Fillmore (19), 
are given in Table 5.1. As pointed out above, these case roles have 
been modified somewhat for the purposes of this research. The main 
purpose of the changes was to enable the identification of case roles 
by algorithmic means. The case roles used in this work, together with 
their definitions are presented in Table 5.2. The case roles presented 
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Table 5.1 Case Roles and their Definitions According to 
Fillmore (19). 



TYPE 


CASE ROLE 


SYMBOL 


DEFINITION 




AGENTIVE 


A 


instigator of the action, animate 


H 


1?VT>I?D TITMPPD 

i!iAJtTiK LCiVi UEiK 


V 


affected by the action, animate 


ESSE 


INSTRUMENTAL 


I 


force or object causing action 
of state 




OBJECTIVE 


0 


semantica ly most neutral case 




SOURCE 


s 


the origin or starting point 




GOAL 


G 


the object or end point 




LOCATIVE 


L 


spatial orientation of the action 


PERIPHERAl 


TIME 

COMITATIVE 


T 
C 


temporal orientation of the 
action 

accompaniment role, animate 




BENEFACTIVE 


B 


benefactive role, animate 
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Table 5.2 Definition of Essential and Peripheral Case Roles as 
Used in the Present Research. 



CASE ROLE 


SYMBOL 


DEFINITION OF CASE ROLE 


AGENT 


A 


the source of the ac ; apecifit, * 
by the PRIMARY PRHASE. 


EXPERIENCER 


E 


the one who experiences the feeling, 
sensation, etc., specified by the 
PRIMARY PHRASE. 


BENEFICIARY 


B 


the possessor (in its broadest 
sense) of some thing, whether 
the possession be temporary or 
permanent, positive or negative. 


OBJECTIVE 


0 


the receiver of the action described 
by the PRIMARY PHRASE. 


LOCATIVE 


L 


the place where the action described 
by the PRIMARY PHRASE occurs. 


TIME 


T 


the time when the action described 
by the PRIMARY PHRASE occurs. 


MANNER 


M 


the way in which the action described 
by the PRIMARY PHRASE is performed. 


COMITATIVE 


C 


the accompaniment case, a subject 
accompanying the source of the action 
described by the PRIMARY PHRASE. 


CAUSE 


Cs 


the case giving the reason for the 
action described by the PRIMARY 
PHRASE. 


PURPOSE 


P 


the case giving the purpose of the 
action described by the PRIMARY 
PHRASE. 
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by Fillmore (10) are contrasted with those used in the present work in 
Table 5.3. To give the reader some notion of the applications of case 
grammar, Table 5.4 presents English sentences illustrative of each of 
the five categories of PRIMARY PHRASE, together with the case-role 
assignments for the various elements of each sentence. 



3. Identification of Case Roles 

In this research, case roles are assigned to NOMINAL PHRASEs, 
SECONDARY PHRASEs and CLAUSES (a departure from Fillmore's procedures). 
The conditions under which such assignments are made, as well as the 
nature of the assignments, will be described in this section. In 
general, case grammar analysis is carried out on the CLAUSE. 
3.1.' Essential Case Roles 

The assignment of essential case roles within the CLAUSE is 
determined by the following rules. 

1. For the AGENTIVE category: A case preceding and E and 0 
cases following the PRIMARY PHRASE- in that order, except 
if the PRIMARJf PHRASE is passive, 0 case preceding and 
E and A cases following the PRIMARY PHRASE In that order. 

2. For the BENEFACTIVE category: B case preceding and 0 case 
following the PRIMARY PHRASE, except if the PRIMARY PHRASE 
is passive-^ in which case these assignments are reversed. 

3. For the EXPERIENCER category; E case preceding and 0 case 
following the PRIMARY PHRASE, except if the PRIMARY PHRASE 
is passive^^ in which case these assignments are reversed. 

4. For the REFLEXIVE category: A - 0 case preceding the 
PRIMARY PHRASE. 



31. A passive PRIMARY PHRASE is defined as follows. If the PRIMARY 
PHRASE Is initiated by AUX and if it is followed by a SECONDARY 
PHRASE introduced by "by" in the same CLAUSE, the PRIMARY 
PHRASE is passive. 
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Table 5.3 Case Roles as After Fillmore and as Used in this Research, 



FimiORE 



CASE ROLES 



PRESENT RESEARCH 



H 

M 
CO 



AGENT 

EXFERIENCER 
GOAL 

INSTRUMENTAL 
LOCATIVE 
OBJECT 
SOURCE 



AGENT 

BENEFICIARY 
EXFERIENCER 
OBJECT 



t 



BENEFACTIVE 
COMITATIVE 
LOCATIVE 
TIME 



CAUSE 

COMITATIVE 

LOCATIVE 

MANNER 

PURPOSE 

TIME 



Table 5.4 Case^Role Assignments for Sample CLAUSES each 

Containing one of the Categories of PRIMARY PHRASE. 



PRIMARY 

PHRASE 

CATEGORY 


ILLUSTRATIVE EXAMPIfS 


STATIVE 


The boy is a man today. 
0 stative 0 T 


BENEFACTIVE 


I have the book in the library. 
B bene f active 0 L 


EXPERIENCER 


The little girl liked ice cream 
E experiencer 0 

with chocolate syrup. 
M 


REFI£XIVE 


The bird flew 
A-0 reflexive 


AGENTIVE 


The cyclist hit the car 
A agent ive 0 

He gave me the letter 
A agent ive E 0 
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5. For the STATIVE category: 0 case preceding and/or 
following the PRIMARY PHRASE. 

The case frames to which these rules give rise are sunmarlzed in 

Table 5. 5. Hie way In which these rules are applied Is described In 

Section 4. 

3*2. Peripheral Case Roles 

In general, the assignment of peripheral cases Is made to 
SECONDARY PHRASEs and Is controlled by the SECONDARY RELATION which 
Introduces the PHRA5/E. The time case presents an exception to this 
general statement. WORDs that signal the time case must be defined 
ostenslvely. They are: 



again first once week 

a Iready f requen t ly still weeks 

always later then when 

day month time year 

days months today years 

ea -ly never tomorrow yet 

finally now tonight 



If a SECONDARY PHRASE contains one of these WORDs it is assigned the T 
case. Otherwise, the following rules apply. 

1. A SECONDARY PHRASE initiated by "to" is assigned L case. 

2. A SECONDARY PHRASE initiated by "by" is assigned M case. 

3. A SECONDARY PHRASE initiated by "for" is assigned P case. 

._4j* A SECONDARY PHRASE initiated by ••with" or by '"without" is 
ascigned C case if the PHRASE contains the NAME of an 
animate entity,*^^ and the M case otherwise. 

5. A SECONDARY PHRASE initiated by any other SECONDARY RELATION 
is assigned L case. 



32. An animate entity is defined to be any PHRASE containing one of the 
WORDS he^ she, her, him, they, them, we or us. 



Table 5.5 Case Frames for Essential Cases as Prescribed by the 
PRIMARY PHRASE Category. 



CATEGORY 


PASSIVE^^ 
? 


CASE FRAMES 


AGENTIVE 


No 
Yes 


A AGENTIVE E, 0 
0 AGENTIVE E, A 


BENEFACTIVE 


No 
Yes 


B BENEFACTIVE 0 
0 BENEFACTIVE B 


EXPERIENCER 


No 
Yes 


B EXPERIENCER 0 
0 EXPERIENCER E 


REFUEXIVB 


m 


A-0 REFLEXIVE 


STATIVE 


m » m 


0 STATIVE 

0 STATIVE 0 



32. A passive PRIMARY PHRASE is defined as follows. If the PRIMARY 
PHRASE is Initiated by AUX and if it is follow«d by a SECONDARY 
PHRASE introduced by "by" in the saae CLAUSE, the PRIMARY 
PHRASE is passive. 
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The way in which thes^ rules are applied is described in Section 4. 
3.3* Assignment of Case Roles to CIAUSEs 

In addition to the case*role assignments described above, work has 
begun to define the ca^'s roles of CIAUS£s. This is a major extension 
of the original Fillmorian conception of case, one that follow- 
logically from the NAME/REIATION view of language espoused in this 
dissertation. The rules which have so far been developed for the assign 
ment of case roles to CIAUSEs are restricted to CLAUPr which contain 
an element of PTC and which are initiated by a SECOm. ^ RELATION. 
The rules are: 

1. A CLAUSE initiated by "from" receives the Cs case (e.g., 
His hands were tough from handling heavy cords . ) . 

2. A CLAUSE initiated for "for" receives the P case (e.g.. 
The equipment was used for testing compoun ds . ) . 

3. A CLAUSE initiated by 'n>y" receives the M case (e.g.. 
He tested the compounds b^r usln^ a n^w method .). 

4. A CLAUSE initiated by 'H^ith" receives the 0 case (e.g.. 
She was finished with typing the dissertation ^ ) . 

5. A CLAUSE initiafed by "of" and preceded by the sequence 

VRB AD J receives the 0 case (e.g., I am sick of making cakes .). 

6. A CLAUSE initiated by "of" and preceded by a NOMINAL FARASE 
is adjectival and receives the same case as the preceding 
NOMINAL PHRASE (e.g.. The process of waking clay .). 

4. The Case Gramnar Program, CGP 

The definitions and rules presented in the preceding section have 
been incorporated into a program called CGP. The program is written 
in PL/I for the IBM 370-165 computer system and requires 126,000 bytes 
of main storage. Execution times are in the range 15,000 words per 
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minute. CGP accepts as input the output of MYRA, PAP and CAP/l* It 
operates in two phases* Phase I is concerned with the categorization 
of the PRIMARY ?V\ASE within the CLAUSE- STATIVE, BENEFACTIVE and 
EXPERIENCER categories are determined by dictionary look-up. T'''=i 
REFLEXIVE category is determined by considering the number of NOMINAL 
PHRASES surrounding the PRIMARY PHRASE. If only one is found and it 
precedes the PRIMARY PHRASE, the category REFLEXIVE is identified. All 
other PRIMARY PHRASEs are AGENTIVE. 

After the PRIMARY PHRASEs have been categorized, the essential 
cases are assigned. Such assignments generally follow the rules of 
Section 3.1* In addition, if SECONDARY PHRASEs containing NAMEs 
indicative of animate objects are introduced by the SECONDARY ElELATIONs 
"to" or "for", the SECONDARY PHRASE is assigned the E or B case, 
respectively. 

Peripheral cases are assigned to SECONDARY PHRASEs based upon the 
rules given in Section 3. 2. Case assignments to CLAUSES follow the 
rules of Section 3.3. 

A flow diagram of CGP is given in Figure 5.1, and typical results 
produced by the program are illustrated in Figure 5.2. More extensive 
i :put will be found in Appendix C. 

5. Experimental Testing of CGP 

The case grammar program CGP was tested using the same documents 
as processed by MYRA, CAP/l and PAP. The output from these three 
programs formed the input to CGP. The results were analyzed by comparing 
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r 



In 
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n 



PHASE I 




YES 



YES 



> 


NO 


ASSIGN 
A6ENTIVE 







Figure 5.1 A flow diagram of C6P. 



PHASE II 



ASSIGN 
MAJOR CASES 
FOR STATIVE 



ASSIGN 
MAJOR GASES 
FOR BENEFACTIVE 



ASSIGN 
MAJOR GASES 
FOR EXPERIENCER 



ASSIGN 
H MAJOR GASES 
FOR AGENTIVE 



ASSIGN 
PERIPHERAL 
CASES 



^ END ^ 
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303. The old toan had taught the boy 



304. 



305. 



DTK ADJ NGN 


AUX VRB 


DTR NON 


NOMP 




PBMP 


NOMP 


AGENT 






OBJECT 


to fish 








PRP VRB 








PRMP 








and the boy loved him 


• 


CNJ DTR NGN 


VRB PRN 


EOS 


PRMP 


PRMP 




AGENT 







Figure 5.2 Sample output produced by CGP from The Old Man 
and the Sea. 
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the cases assigned with the cases I would intuitively expect for the 
construction. Differences between assignment and expectation were 
treated as errors committed by CGP. The results of this error analysis 
are given in Table 5.6. PAP and CGR operate in 126,000 bytes of storage 

at the rate of 7,700 words per minute. 
6. Summary 

Itie program described in this chapter represents the first 
implementation of case gramnar. While the accuracy achieved was only 
in the range of 75%, many of the errors were due not to CGP, but to 
the programs which produced the input to the case grammar program. 
Given accurate input, CGP may reasonably be expected (based upon 
preliminary studies) to achieve greater than 95% accuracy. Further 
work to refine the case grammar program are currently under way. 



Table 5.6 Accuracy Attained in Case Granniar Assignment Made 



by CGP. 




ACCURACY 


DOCUMENT 


ATTAINED 


"..Precise Definition of •Algorithm'" 


70% 


The Old Man and the Sea 


83% 


"The Clavichord and How to Play It" 


65% 
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CHAPTER VI. LANGUAGE ANALYSIS AND INDEXING 

By definition, the function of a metalanguage in semantic 
analysis is to singularize or to dif ferentiate-^the two 
notions are interchangeable- -the documents of a corpus by 
the interplay of complex correspondances between natural 
language formulations and equivalent expressions in a 
metalanguage. 

J. C. Gardin, Semantic Analysis Procedures in 
the Sciences of Man 

1. Introduction 

In previous chapters of this dissertation a series of language 
analysis procedures has been described. Tt^ase procedures are largely 
syntax based; the strongest appeal to interpretational characteristics 
is made in the case grammar analysis procedures of Chapter V* In the 
present chapter, procedures are outlined which permit the construction 
of structural representations of English sentences (the metalanguage 
of Gardin (1)). Since the basic purpose for all the work described in 
this document is the development of means of producing improved indexes, 
a brief statement of how such structural representations relate to 
indexing is in order. Following that, a review of some closely related 
work is given. The remainder of the chapter is then devoted to a 
discussion of the proposed structural representations. 

2. Indexes and Indexing 

It is conventional to mention the exponential growth of the 
literature of various fields as the main cause of the difficulty one 
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meets in trying to find data pertinent to his needs. The quantity of 

documents makes researching the literature difficult, but this 

difficulty is not the central problem in the array of problems which 

the information storage and retrieval specialist hopes to solve. In 

fact, a statement made some 300 years ago by Howell is still adequate 

to sum up the important problem which must be solved if published data 

are ever to be readily accessible when one needs the data: 

The reason why there is no Table or Index added hereunto 
is, that every page is so full of signal remarks that 
were they couched in an Index it would make a volume as 
big as the book and so make the Postern Gate to bear no 
proportion to the building. (2) 

Much work has been done to overcome the problem which Howell solved 
by deciding not to include an index to his book at all. Most of this 
work has been directed toward increasing the amount of data present 
in indexes, without a concomitant increase in physical size of the 
index. If one can speak of "data density," these efforts have all 
been directed toward increasing the "data density" of the index. One 
of the very popular techniques has been that of kejrword indexing. The 
central assumption o^ keyword indexing is that more complex concepts 
(data) can be formed at search time through approproate application of 
Boolean algebraic functions (relations) to combinations of keywords. 
The inadequacy (3) of such an approach is due to the fact that most of 
the relationships which humans impute between data elements cannot be 
expressed in terms of any combination of Boolean functions. As a 
consequence of the failure of keyword indexing to yield satisfactory 
indexes, attempts to improve keyword indexing, to expand Uniterm 
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indexing to multi-term indexing (4), to improve keyword- in-context 
indexes (5), to produce articulated indexes (6), to simulate human 
memory (7), and to produce general representations of language (8) have 
been made which are all concerned with retention (and explication) of 
relations among data elements. 

It might be reasonable to suppose that a document is its own best 
representation as Howell concluded. But this supposition rests upon 
the further assumption that the words in the doctunent have the same 
significance for the reader as for the author. Thus, two important 
factors must be accounted for in indexing any document. These factors 
are: 

1) The structural properties of a document. (A document conveys 
data to the reader through the unique organization of language 
elements within the document.) 

2) The interpretational characteristics of a doctiment. (A 
document conveys data to the reader through the significance 
imputed to these language elements outside the framework of 
that document.) 

Much attention has been paid the latter factor. Thesauri, authority 
lists, dictionaries and other devices have been proposed, discussed, 
designed, built, used and scrapped by many workers. At the same time, 
the organization and structure of the doctmient has been treated (when 
recognized at all) as something to be got over as soon as possible. As 
a result of this attitude, documents are deliberately shorn of the 
relations they contain at the indexing stage. 
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To improve Indexes, indexing methods must be developed which 

a) retain as much of the data contained in a document as 
possible; 

b) retain and make explicit relationships among elements of 
these data; 

c) add data not expressed in a document but which derive from 
relations among elements of data in a document and elements 
of data contained in analysis documents (such as thesauri 
and dictionaries); 

e) develop e^rficient methods for deriving the index material 
from text, with scant appeal to the ''meaning*' of text data. 
This research has been directed toward the development of language 
analysis procedures through which one might achieve these aims. In 
Section 4.3 evidence will be presented to demonstrate the extent ^ 
to which these goals have been realized. For now, an indication of 
how language analysis relates to indexes is in order. 

3. Indexing Theory 

An indexing system may be characterized, after Landry (9), as in 

Figure 6.1. In this model, four major c nponents are identified, the 

input -document space ^ ; the ana lysis -document space, ^ ; the index 

i a 

space, ; and the several indexes, I , I , I , which the system 

1 2 n 

is capable of producing. The indexing system embodies a set of 
procedures or mappings from input documents to index (es). These pro- 
cedures are part of the document space ^ . They operate on input 
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documents first to create the document space f)^^ then on the elements 
of to produce the index space , and finally on the index space to 
produce an index. The index space is a space whose dimensionality is 
determined by the number of attributes the procedures in ^ ascribe to 
the input documents. In general, the higher the dimensionality of the 
index space the greater the variety and adequacy of the indexes the 
system can produce. Thus, the index space corresponds to a meta* 
representation of the input documents. The fidelity of this representation 
is a function of its dimensionality. 

For the present research, the significance of these observations is 
that the analysis procedures described in Chapters III*V correspond to 
Landry's analysis documents and, since they Identify a fairly large 
number of attributes of the input text, they may be supposed to produce 
an index space of high fidelity. While unequivocal evidence in support 
of this supposition cannot be given. Its partial verification is discussed 
in the last part of this chapter* The index space may, I think, be 
viewed in terms of structural representations of language. The nature 
and derivation of such representations are discussed below. 

4. Procedures for the Graphic Representation of Sentences 

In the remainder of this chapter I describe a graphic representation 
of English sentences which I propose as an approximation to Landry's 
index space (10). The procedures outlined for generating this 
representation are, I believe, sufficient to permit their implementation 
with little difficulty. This chapter is not concerned, however, with the 
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exact niethods for such implementation nor with the graph theory which 
might be pertinent. It is recognized that the mathemat:\:al theory 
involved in the storage, matching and construction of graphs is an 
important undertaking, but one whose magnitude is beyond the scope of 
this work. 

4.1. Why a Graphic Representation? 

It is well-known that English Is not strictly linear even though 

it is produced in a linear fashion (11). This fact is easily illustrated 

by sentences containing embedded clauses, as 

Sentences that contain embedded clauses have a non- linear 
structure. 

For indexing, the import of this observation is that the specific 
relationships which are assigned to elements of a sentence are often 
not immediately obvious from the linear sequence that we call a sentence. 
But if an algorithmic way of explicating these relationships, that is, 
of producing a graphical representatlvM English text, could be 
devised it might be possible to derive indexes of various kinds from a 
single representation. Perhaps more important, the representation itself 
could serve as a kind of index which a person could access directly. 
Work most closely related to what I shall describe is that of Fugmann 
and his colleagues (12). This group has devised a machine-aided system 
for the production of graphic representation of patent literature which 
they believe will lead to considerably improved information rc^trieval 
systems. This work, as well as other related work is discussed in the 
next section. 
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4.2. Approaches to the Graphic Representation of Language 

By wa> of preamble, it must be noted that the graphic representations 
which are of interest here are those involving directly the elements of 
a sentence. Hence, I exclude from cons iderajbion all those grammatical 
j^dies in which parse trees of various kinds are generated (or 
proposed) . 

Many researchers have recognized the value of graphic representation 
of sentences. Several approaches to the representation of language 
in graphic form are r ".viewed briefly below. 
4*2*1. Graphic Structure Proposed by Salton 

Salton^s suggested graphs that represent sentences as an aid in 
each compon^^mt of his information storage and retrieval system: analysis, 
identification, normalization and matching. Salton proposes representing 
sentences by u^ing tree structures (a graph which has at most one 
branch entering each node and which contains no circuits (13)). The 
sentences are analyzed by using a dependency grairanar in which the verb 
is the fulcrum, or central relator of the sentence (14). The graphs 
contain explicit syntactic relations such as noun-verb and noun- 
preposition. Semantic relations stich as identity and location axe also 

: y 

suggested as components bf the graph. An^example of such a graph is 
shown in Figure 6.2. 
4.2.2. TOSAR Graphs 

A somewhat different approach to the c .^struction and tise of graphs 
is that developed by Fugmann, Nickelsen, Nickelsen and Winter (12), involv- 
ing the 'intellectual construction of graphs for use in the automated 




Our Father (in) heaven hallowed thy name 

(which) [f] 



come be done is (done) 




thy kingdon jl:hy will (on) earth it (thy will) in heaven 



Figure 6.2 Graphic representation proposed by Salton. 

([ij « Identity; [I] « Location) (15). 
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storage search and retrieval of concepts. The concepts itsed in the 
chemical literature lend themselves to graphic representation and hence 
provide a good starting point for investigation. In the graphic model 
TOSAR (Topological Method for^ the Representation of Synthetic Analytical 
and Relations of Concepts) developed by this group, relations between 
concepts are represented by lines which join the concepts. Nodes labelled 
with Roman letters represent substances and nodes labelled with a Greek 
letter represent processes. Each level of the graph represents a 
particular stage of a complex process. Thus the graph A characterizes 
the combination of two substances (by means of an unspecified process) 
while graph B characterizes the separation of a substance into two 
components (again without specifying the process). An example of a graph 



produced via the TOSAR laodel is given in Figure 6.3. 
4.2.3. Graphic Representation of Deep Structure 

One purpose of the TOSAR model is the adequate representation of a 
concept which embodies several possible index entries. While chemical 
literature especially lends itself to this type of analysis, the 
developers of TOSAR believe that more investigation is needed to 
define all the concepts in a document in this way (16"). An effort in 
this direction is the Conceptual Dependency Parser (CDP) developed by 
Schank and Tes ler (17). The relevance of the CDP lies in its . 





graph A 



graph B 
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Propylene 



Al(Alkyl) 



Oligomer Ization 



Fractional distillation 




Hexene 



Figure 6.3 A graph produced via the TOSAR model of the sentence: 
I Oligomerization of propylene with the aid of Al(Alkyl) 

to obtain hexene » and separation ana purification of ^ 
the excess propylene by fractional distillation and 
recycling of the propylene vl8). 



/ 
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similarity to case grammar and in its emphasis on the relational attributes 
of a sentence. According to Schank and Tesler, the CDP is an approach 
to the identification of the "underlying meaning" of a sentence. Noun 
phrases are assigned membership in a "governing" or an "assisting" 
class, and are assigned roles such as "actor" or "participant." Each 
word contained in the vocabulary of the parser is defined by a list of 
attributes similar to those suggested by Katz and Fodor (19). If two 
words occur which have incompatible or undefined attributes » the system 
must be updated to include the necessary relationship. An example 
obtained from the CDP is given in Figure 6.4 using the sentence "John's 
love is goiidft^ Each type of arrow (arc) depicts a different kind of 
relationship. Presumably, the sample sentence is broken down into two- 
"concepts": "John loves" and "One (John's love) is good." The work 
of Schank and TeslerV represents an attempt to acquire a more profound 
analysis of sentences than the current state of the art ran provide. 
4.2.4. Graphic Structures Proposed by Plath 

TWO other programs which must be mentioned have each been implemented 
for languages other than English. A program has been implemented which 
constructs diagrams of Russian & itences (20)* The program is based on 
a projective grammar^^ and is used as a method of presenting the output 
of a predictive syntactic" analyzer. The results have also been used 
to classify and analyze sentences according to their structural 
properties. An illustration of ^lath's results is given in Figure 6.5. 



33. A grammar similar to an immediate-constituent grammar. 
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John 



I 

love 



one 



Figure 6*4 



The "underlying meanirng" of the sentence "John's 
love is good." as represented by the Conceptual 
Dependency Parser (17). 
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auiohm:ically generated 
sentence diagram 



IMPLIED NODAL 
INTERCONNECTIONS 



CLAUSE INDICATOR 
NAPRJAZHENIJA 

KONDENSATORAX 
OTSCHITirVAJUTSJA 
NA 

EHKRANE 

OSTSILLOGRAFA 
KATODNOOO 



Figure 6.5 Plath's Automatically Produced Sentence Diagrams 
(20). 
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4.2.5. Graphic Structures Proposed by Tesniere 

The second program has been implemented for French sentences by 
Tesniere (21). In this research the language is partitioned into four 
classes: verb, substantive, adjective and adverb. The verb is treated 
as the central relator of the sentence. Tesniere theorizes that tue 
words of the sentence are more si^^nificant ;he lower they are on the 
graph. Thus, in the example sentence in Figure 6.6, "vert" and "libre" 
would be the most significant words. 

4.3. A Proposal for the Production of Graphic Representations of 
English Sentences 

4.3.1. Definition of Components 

The production of a graphic representation of an English sentence 

hinges upon the relational elements of a sentence. A subgraph is 

produced for each string which contains a PRIMARY REIATION, and then 

the relationships between these subgraphs are established to produce 

a graph. Such a process could, in principle, be extended beyond sentence 

boundaries. The SECONDARY RELATIONS are represented by the arcs of 

the graph. All other elements are represented as nod^ of the graph. 

COMPOSITE NAMES (i.e. , NOMINAL PHRASEs) are represented as single nodes 

for simplicity in- presentation. The individual elements of ? 

COMPOSITE NAME could be represented as distinct nodes of the graph, 

and the relationships could be indicated by the arcs which join these 

SIMPLE NAMES. The desirability of doing this depends upon the particular 

use intended for the graph. 



J 



171 

Nodes of the graph will be distinguished as NAME nodes (N-nodes) 
and PRIMARY -RELATION nodes (PR-nodes). Again for simplicity of 
presentation, N-nodes will be labelled with capital Roman letters 
instead of the specific NAME, and PR-nodes will be specified by geometric 
shapes (see Table 6.1)* 

-4-3.2. Representation of PRIM^Y-RELATION Nodes 

A PRIMARY RELATION will appear as a node (called a PR-node) in a 
graph, the form of which identifies the case-grammar class to which the 
PRIMARY REIATION belongs. Arcs extend from the PR-node in such a way 
that the direction of the link between the PRIMARY RELATION and its 
arguments (NAME) is indicated. If the PRIMARY RELATION is RECESSIVE, 
the symbol for the node indicates thi9 fact. Specifically, PARTICIPIAL 
MAIN RECESSIVE PRIMARY RELATIONS are denoted by a darkened node and 
INFINITIVAL MAIN RPRs are denoted by a shaded node. Table 6.1 lists 
the node symbols which will be used for the various categories of both 
DOMINANT and RECESSIVE PRIMARY REALTIONs. 
4.3.3. Representation of Case Types and COMPOSITE N^^ffis 

The SECONDARY RELATIONS of a sentence are represented by the arcs., 
of the graph. Since SECONDARY RELATIONS always have two arguments, the 
arc makes explicit the arguments of the RELATION. The direction of the 
arc will always be from the second argument to the fir&t. The case 
role assignment to the second argument of a SECONDARY RELATION is 
retained to further specify this RELATION. The case role is indicated 
by a numeral over the arc. The numerical codes corresponding to the 
case types arc given in Table 6.2.- An example should clarify these 
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Table 6.1 Symbols Used to Represent Case Grammar Classes of 
PRIMARY RELATIONS. 



PRIMARY 


SYMBOLOGY 


RELATION 


DOMINANT 


RECESSIVE 


TYPE (CASE) 


PARTICIPIAL 


INFINITIVAL 


Stative 






*^ 


Agentive 








Experiencer 






^ *' 


Beneficiary 








Reflexive 













6.2 The Case Roles Along with the Corresponding Numerical 
Codes Used In the Graphic Representation. 



CASE ROLE 


NUMERICAL CODE 


Agent 




1 




~7 




Object 


2 


Experlencer 




3 


Beneficiary ' — 




4 


Location 




5 


Time 




6 


Manner 




7 


Comitative 




8 


Cause 




9 


Purpose 




10 


Facet , 




0 
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issues*. Suppose we have, in prefix notation (cf.. Chapter IX, 
Section 3.4.4}> 

h AB 

2 

This would be represented in graphical form as 

A< B 

Likewise, the sentence fragment 

The boy in the red shirt • • • 
1^ — ^ ^ — , — ^ 

Ah B 
2 

would be represented graphically as 

0 

A< B 

Hie item "facet" listed along with the case types , is not a case type 
but indicates an unmarked adjectival* relationship* Recall that a case 
is assigned to a SIMPLE or GQMfOSITE NAME. A NAME which implicitly 
modifies a second NAME does not, therefore, receive a case grammar 
assignment. Also, it is important to note that the case role is that 
of the node from which an arc emanates. 
4.3.4. Representation of Consecutive SECONDARY PHRASEs 

A construction which presents difficulty in analysis is a COMPLEX 
NAME consisting of a succession of ^SECON DARY PH RASEs > The difficulty 
which this construction poses is that of correctly identifying the 
first argument of the pair required by the SECONDARY RELATION (recall 
that the second argument of the pair always immediately follows the 
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SECONDAIi REIATION). Consider the sentence: 



The exam was given to the girl In the library. 







For the first SECONDARY IIEIATION9 the two required arguments are A 
and b^* Rut for I2 It Is not possible to determine unequivocally 
whether the arguments are b^ and B or B and C. Traditional grammar 
would employ a proximity rule which states that a noun phrase modifies 
the noun or ^rb phrase to which It Is nearest. While this rule might 
eliminate much confuslot;^ It would place a severe restriction on the 
Interpretation of the sentence* The solution I propose Is the following. 
Since the FRIMm RELATION Is the central REUTION of a SENTENCE, all 
NAMES are related^ at least Indirectly » through the PRIMARY RELATION. 
All ADVERBIAL SECONDARY REUTIONs (see Section 3.4) will therefore take 
as their first arguments the PRIMARY RELATION of the CUUSE In which 
the SECONDARY RELATIONS are found. But this somewhat arbitrary decision 
makes It desirable for the graphic structure to contain an Indication 
of the linear order of the phrases In the sentence. Such an Indication 
Is provided by a dashed arrow drawn from the C(AfFOSITE NAME of a 
SECONDARY PHRASE to the COMPOSITE NAME of the SECONDARY PHRASE which 
Immediately precedes It In the CIAUSE* For example » 
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The dashed arrow is thus an ordering relation such that "B >C" 

implies B precedes C in the sentence from which the graph was derived. 

4.3.5. Representation of PHRASEs Related by COORDINATE COWJUKCTIVE 
SECOHDARY REIATIOHs 

The particular elements of CCP found in a SQITENCE may be important 

to the interpretation of the SENTENCE. Thus, each member of CCP is 

represented in a graph as an arc labelled with a numeral (see Table 6.3.) 

In principle, there may be any number of PHRASEs joined by an element 

of CGN. Each PHRASE %rhich is part of such a series is linked to the 

next in succession by a dashed arc which indicates their sequential order 

The following examples serve for illustration. 

The meal consisted of cheese, bread and wine. 
y. ^ ^ 

A O BCD 



A 
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Table 6.3 COORDINATE CONJUNCTIVE SECONDARY RELATIONS and their 
Corresponding Numerical Codes. * 



Element 


Numerical Code 


and 


1 


or 


2 


but 


3 


nor 


4 


not 


5 


9 


6 
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4.3#6. Representation of CLAUSES Joined by CCN 

Clauses joined by element of CCN generally h«ve a common element 
such as a PRIMARY RELATION or a NOMINAL PHRASE. If there is a common 
NOMINAL PHRASE, the right*most NAME of the first NOMINAL PHRASE of one 
CLAUSE is compared with the right-most NAME of the NOMINAL PHRASE of 
the second CLAUSE. If these NAMEs are the same the CLAUSES are joined 
by an arrow connecting the N(XaNAL PHRASEs. Otherwise, an arc is 
drawn connecting the PRIMARY RELATION of each CLAUSE. Again, the 
direction of the arc Indicates the order of the CLAUSES. The following 
examples illustrate this treatment. 

High-level languages may be expensive but assembler language^ 
* ^ > y ' 

A Ob c 

requires more specialized personnel. ' 




I 

C ^ > | I — ^-►D 
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4,3.7. Representation of CIAUSEs Joined by SCN 

Two clauses which are joined by an element of SCN are connected by 
a marked arc ( I > )> The arc is always darwn from the CLAUSE 
initiated by SCN to the second CLAUSE. If desired, the ordering of 
the CLAUSES is easily preserved. The following examples illustrate 
how CLAUSES joined by an element of SCN are depicted and .lOw the 
ordering of CLAUSES can be preserved. 

He finished the work after his friends left. 



A O 





o 




2 



> B 
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After his friends left he finished the work. 

V f 



OA o 



B 



^ B 



4*3*8* Representation of NOMINAL CIAUSEs 

A CLAUSE may serve in a NOMINAL position of a PRIMARY CLAUSE. 
When a CLAUSE serves as an argument of a PRIMARY RELATION in a 
second CLAUSE, a marked arc is drawn from the PRIMARY RELATION of the 
PRIMARY CLAUSE to the PRIMARY RELATION of the NOMINAL CUUSE. The 
following examples are illustrative of this treatment. (Note that 
the NOMINAL CLAUSE may consist of just a PRIMARY RELATION). 



I 
I 

f 



i 
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The boy knew that the answer was wrong. 



^ ^ 



A A 



» — ^ 

B 



•B 



n 



n 

n 
n 

0 

In 



^ would like to go. 

A a 0 



1^ 
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I would like him to go. 
A O B ® 



.1 



Mary wanted him to mail the letter. 
A CD B ^ C 



' ^ 



B 



1 
] 

3 
3 
3 



3 



I 

\ r 

1- 

n 

0 



JiOD 



To err is human « 
^ O A 



ft 



I 



Effort on sentence classification has been devoted to finding 
' . • ' , • 



0 

n 
n 



algorithms for implementation. 
C 0 



ft 
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^ 3.9. Repgesenf tton of ADJECTIVAL CIAt^Es 

ADJECTIVAL CLAUSES are connected to a NOMIMAL PHRASE by a marked 
arc. The arc is drawn from the PRIMARY RELATION of the ADJECTIVAL 
CIAUSE to the NOMINAL PHRASE. 




A # C O B 



B / 



187 




Mary disliked John throwing 
A CD B ■§ 



rockg. 
C 
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4.3.10. Representation of ADVERBIAL CLAUSES 

ADVERBIAL CLAUSES are depicted in a manner similar to that for 
NOMINAL and ADJECTIVAL CLAUSES . A marked arc is drawn from the 
PRIMARY RELATION of the ADVERBIAL CLAUSE to the PRIMARY RELATION which 
it modifies. Examples of adverbial clauses are: 



iJiey rowed the boat to where the birds were circling. 
A cm B ^ C O. 



V 
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These examples serve to illustrate the way in which SENTENCES 
in English may be transformed into graphical representation. Such 
representations are susceptible of production by computer program. 
Although the programs for doing so have not been written, all of the 
necessary data to carry out the transformation from linear sequences 
to graphical representations are made available by the programs 
developed in this research. 

5. Summary 

In this chapter, it has been suggested that the proposed graphical 
representation of SENTENCES provide a strong basis for the derivation 
of Indexes. The representations preserve and explicate the NAMEs and 
RELATIONS in a SENTENCE and make it possible to extract NAMEs at various 
levels of complexity for various purposes. For instance, keyword 
indexes, KWIC indexes, articulated indexes, and so on can still be 
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derived from the graphical representation, but more complex aggregates 
can also be retrieved. Such graphs make it possible for the first 
time to do what the chemist calls a "substructure search", that is, 
to isolate con5>onent parts of a structure related to the searcher in 
some meaningful (to him) way. 

Much of the work in this chapter has gone on in parallel with = 
that carried out by Strong (22), and the interested reader may wish 
to consult that work for additional information on graphical 
representations of text* 
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CHAPTER VII. CONCLUSIONS AND DIRECTIONS FOR FUTURE RESEARCH 



1. Conclusions 

One of Murphy's laws has it that whatever one wishes to do he 
must always do something else first. So it has been with this research. 
The general aim of the work has been toward automated indexing. But 
before that could be done a great deal of work in computational 
linguistics had to be done first. Thus programs have been developed 
which provide for the analysis of English ^oxt on several levels. 
These programs are all efficient (at least in relation to any 
earlier programs written for the same purposes) and they are 
effective. Furthermore they have all been related within a theoretical 
framework of language and are therefore logically consistent. Perhaps 
the most Important observation to be made is that all the procedures 
are more nearly free of semantic Inferences than any other similar 
programs so far developed. And the fact that the case grammar program 
described in Chapter V is the first such program to be developed is 
also a significant achievement. 

But all of the language analysis procedures are only a means to 
an end. That end has not been reached, but the structural represent- 
ations proposed in Chapter VI are close to the desired goal» namely 
the automatic generation of a relationally rich base from which 
better indexes might be derived. Further comment on that point is, 
unfortunately, idle speculation now. 
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Some significant progress has been made in language analysis by 
computer^ and it may be hoped that these results will lead to new^ 
improved indexing procedures in the near future. 

2. Directions for Future Research 

Several lines of investigation suggest themselves. Refinement 
of the language analysis procedures is one of these, one which is 
already underway (Chapter IV) • Refinements may take two directions. 
One is to improve the efficiency of the programs by redesigning them 
or by recoding them in assembler language , for instance. The 
second is to Improve the accuracy of the output of this procedure 
by adding o^r modifying rules and by further refining the dictionaries. 

A second line of investigation concerns the completion of the 
definition and implementation of the graphical-representation 
procedures. This area too vis under active study, but much needs to 
be done. 

A third area of study is that dealing with more of the inter- 
pretational aspects of language. Thus it may be Important not only 
to know how words relate to one another within a body of text, but 
also how they are related to other bodies of text or to **real-world** 
referents. For information retrieval systems this is an iiqtortant 
question. 

There are many other possibilities. The last to be mentioned 
is automatic abstracting. It may be expected that improved abstracting 
procedures could be developed using data derived by the language 
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analysis procedures ♦ The precise way in which linguistic data would 
be used in automatic abstracting remains to be determined* 
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APPENDIX A 



ERIC 
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APPENDIX A* A LISTING OF DICTIONARY ELEMENTS AND THEIR FREQUENCY 

The elements contained within the dictionary were chosen from the 
Brown University statistical analysis of a corpus of a million 
words (1)* Elements were chosen on the basis of their frequency 
of occurrence. The extended dictionary consists of the entire 
listing presented in this Appendix. The basic dictionary consists 
of all the elements except for VRBs and ADVs. 
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ADV 








Element 


Freauencv 


Element 


Freauencv 


On 1 V 


1747 






even 


1171 


apparently 




also 


1069 


limned 4 a 1 v 




just 


872 


recently 


123 


too 


832 




1 00 


still 


782 




1 1 <; 

113 


here 


750 




11^ 

113 


again 


578 


obvlotis Iv 


HA 


once 


499 


/«/\fnn1 ot*p1 M 


1 10 
1 lU 


a Iway s 


458 




1 no 

1U7 


away 


456 




1 OA 
lUv 


less 


438 ' 




inA 


a Imos t 


432 




1 


later 


397 




often 


368 










attv 




perhaps 


307 




wwuci y 




bieinenu 


Frequency 


^ Co ^ 






ill fPaHv 


273 


IS 


10099 




267 


was 


no t £ 

98 16 


ff^ \/t/civ^y 


261 


De 


6377 




246 


nao 


CI 


sORiet lines 


221 


are 




fur ther 


218 


have 




usually 


206 


were 




soon 


199 




0/. 70 


near 


198 


n a 


0/. 


alone 


195 




71 0 
/IZ 


finally 


191 


it*s 


302 


farther 


183 


having 


279 


simply 


170 


Vm 


268 


actually 


166 


am 


228 


especially 


160 


that^s* 


186 


certainly 


143 


can^t 


796 


ready 


143 


wasn^t 


154 


directly 


141 


there* s 


109 


particularly 


146 


you •re 


151 


likely 


151 


hadn^t 


99 


suddenly 


153 


he*d 


98 


nearly 


141 


isn^t 


77 


merely 


135 


sheM 


67 


generally 


132 


one's 


65 


clearly 


128 


they •re 


65 






we're 


58 



1 



T 



199 



in 
In 

i 

r 
r 



n 



MOD 

Element 

would 

will 

can 

could 

may 

must 

should 

might 

shall 

cannot 

I'll 

couldn't 

wouldn't 

won't 

we'll 

he'll 

she'll 



AJN 

Element 

do 

did 

don't 

does 

didn't 

doesn ' t 

got 

get 

let 

CCN 

Element 

and 

but 

or 

nor 

not 



Frequency 

2714 
2244 
1772 
1599 
1400 
1013 
888 
672 
267 
258 
181 
175 
129 
105 
64 



Frequency 

1363 
1044 
489 
485 
401 
87 
482 
750 
384 



Frequency 

28852 
4381 
4207 
195 



SCN 

Element 
If 

than 

then 

like 

since 

however 

though 

yet 

although 

thus 

whether 

therefore 

unless 



DTR 

Element 

the 
a 

his 
an 

their 

its 

jny 

our 

your 

1 

every 
2 
3 
4 

third 
10 
5 
6 

15 
30 
8 
12 



Frequency 

2199 
1789 
1377 
1290 
628 
552 
442 
419 
319 
312 
286 
205 
101 



Frequency 

69971 
23237 
6997 
3747 
2670 
1858 
1319 
1252 
923 
496 
491 
450 
282 
196 
193 
143 
134 
113 
109 
106 
104 
98 
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DTR/PRN 
Element 

this 

one 

her 

all 

more 

other 

some 

these 

two 

first 

any 

most 

many 

much 

each 

those 

own* 

both 



i 



another 

three 

few 

enough 

several 

second 

four 

kind 

five 

six 

million 
hundred 
ten 

neither 

ones 

eight 

whole 

either 



INT 

Element 

very 

rather 

quite 



Frequency 

5146 
3292 
3037 
3001 
2216 
1702 
1617 
1573 
1412 
1360 
1345 
1160 
1030 
937 
877 
850 
772 
730 
686 
683 
610 
601 
436 
377 
373 
359 
313 
286 
220 
204 
171 
165 
141 
116 
104 
309 
284 



Frequency 

796, 
281 



PRP 

Element 

o£ 

to 

in 

for 

vith 

on 

at 

by 

from 

out 

up 

over 

after 

before 

through 

down 

between 

under 

off 

during 

without 

around 

upon 

until 

toward 

among 

within 

along 

above 

across 

outside 

except 

beyond 

inside 

instead 

throughout 

dispite 

about 

into 

below 

according 

behind 



Frequency 

36411 
26149 
21341 

9489 

7289 

6742 

5378 

5305 
4369 
2096 

1895 

1236 

1070 

1016 

969 

895 

730 

707 

639 

585 

583 

561 

495 

461 

386 

370 

359 

355 

296 

282 

210 

181 

175 

174 

173 

141 

104 
1815 
1791 

145 

139 

258 
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PRM. 



REL 



Element 

h« 

it 
I 

they 

you 

•he 

we 

him 

them 

me 

US 

things 
least 

thing 
others 
anyone 
none 



Element 

something 
nothing 
anything 
everything 



PRN, 



Element 

himself 

itself 

themselves 

myself 

herself 



Frequency 

9543 
8756 
5173 
3618 
3286 
2859 
2653 
2619 
1789 
1181 
612 
368 
343 
333 
323 
140 
108 



Prequancy 

450 
412 
280 
185 



Frequency 

603 
304 
270 
129 
125 



Element 

which 

tfho 

what 

whose 

whom 

when 

where 

hoir 

while 

why 

whataver 



VRB 

Element 

said 
made 

see 

know 
coma 
go 

came 
take 

found 

went 

say 

put 

think 

took 

set 

told 

find 

C-ing 

look 

knew 

give 

given 

become 

saw 

want 

done 

began 



Frequency 

3562 
2252 
1908 
252 
146 
2231 
938 
834 
680 
404 
112 



Frequency 

1961 
1125 
79i* 
772 
683 
630 
625 
622 
611 
536 
507 
504 
437 
433 
426 
414 
413 
399 
399 
399 
395 
391 
377 
361 
352 
329 
320 
312 
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Element 


Preauencv 




Element 


Frequency 


save 


285 




continue 


107 




281 




determine 


107 


coon 


970 




serve 


107 


tell 






stress 


107 


held 






applied 


106 


keep 


264 




closed 


106 


seems 


258 




reach 


106 


brought 


o c o 

253 




write 


105 


heard 


247 




married 


105 


uecanic 


OA A 
Zh-0 


- — 


remained 


105 


known 


245 




covered 


104 


seen 


229 




played 


104 


provlue 


zio 




spent 


104 


be 1 ieve 


200 




built 


103 


says 






becomes 


102 


gone 


195 




related 


102 


kept 


186 




rise 


102 


wrote 


181 




meant 


100 


lead 


173 






makes 


172 








tried 


170 








shown 


loo 








bring 


158 








wriuuen 


514 










153 








sat 


150 








meet 


I/O 

148 










1 /. c 








con 


145 








Qhntiod 


1/. 1 








iroTn^mhoir 


uo 








comes 


137 








understand 


132 








ran 


134 








led 


132 








met 


132 








ask 


128 








consider 


126 








appear 


118 








born 


113 








include 


113 








gives 


112 








speak 


110 








expect 


108 










•r 
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APPENDIX B. A LISTING OF THE SOURCE DOCUMENTS FOR THE TESTING OF 
COMPUTER PROGRAMS. 

The documents are: 

"The Need for a More Precise Definition of 'Algorithm' 

The Old Man and the Sea 

"The Clavichord and How to Play It" 



206 



KIIHMS r.UU AUTOMATIC ^.(•^Hllrl^^O •tAl.hlWf-Gi 'y?-'>f MtM HKliVUlUS OISn'SSltiN 

\hh .:ACllI'n-b. l»i«VlHUl-LN I A«V l*jUlt»US JrllCH CM *\\' PHM'\iM)1r\) UY A ."ACHli^l*- CaN M- 

tjiUns^N AJj .v. •\'OUJUTH.O. (.fJ.MVl.-IOl-LYf A!.L ALOUrtl fM;i^ UJHTH HAVl: »-AK Bhtr; UU'i 
SFKL.CUl>f Ab i.rUl. AS ri{Ji»: 'inrai I«:y HH ^XJ>l-Cl..n U'! fKJ: HHKsVnr SIAH: UI- SCItMCI; 
I CA.-I Ii< »"Ur>jCP>II HhV.FhH'^O hY .-lACHIK'f:. IHt lAbf SlATh»l'.Nl Kl-(;iMKhb biML CL 
AKUMCATHVU 4b MAVI- /^L»<hAtiY b.l hN» THi: ACIUAL API>l.l (.AU»;r* Of- AN ALMJAI Min .1 
AY fUKM U»»T TU hfc Vt.KY L^"-J^MHYi A!il) Tub JJM U» Ul.d »K i;l 'Uj. t:> I'Hh l:'JHJKl-.A I JUN 
INVIJLVHD •■'»AY S?f: C'l:«Mr>l»i. un TP(. U(HhH -lA,-!!)! TliL hi .UtV 'IUiin JvACJUrUS HAV>- 
A LPUffcU CAPACITY (SHvC''. THt Nijl^rtrK (iF ?-.i;W.KY C- Ul.S i: . J.iPl- AM) fut ^.•^HACI^Y 
CF f^nCH CULL IS Ll.UrhD). THfcKMU»5ef IT ..mY Ti.*>.»ulH Iti o' l^^lM5SIKLL HI (XuCUT 
F AfJ ALOUKITH.M 'JMOFR fiXI'^lIht; ClVf»IiIONS. OUU CA.i H5. I i.LfN I OY flit: bUCLlU 
CAN ALGUHirHM. IhV- VfiRY 3I.-lPLb PHtUsLeM OF FlNbl^r, 1Mb (Hh.^i .> » CUJUiOM OlVlS'Jl^ 0 
F I WO ;IUMHERS CAflMOT Hfi bULVfcl) bY HAMI> IF IT RKOUl'^hS H.(tHJ; )>AIM;< AND INK fhA.M IS 

AVAtLABLc. SlMlLAKLYt A P«(idL»:r, KILL NUT Hh VJ» VAlLt- liY hACHlNh Ih If RL'^UlRtS 

MDrtF MGflllRY SPACC THAN THl-Rl: IS \H THE MACHlNf. IH SUCH CASfcS Wfi SAY THAI AN A 
U^URITHH IS PUTtr:TlALLY Ri ALIZAbLf: IF IT LHAUS TJ; 1rifc k>:gUlHr:lJ RfiSULT IN A UNIT 
n t^liHBKR HF STEPS tEVEN THOUGH THIS NUMdER tiAY itE VERY LARr.h). IN OTHEw t.'URUS, 
IT WOULD i^E PnSSlbLE T(> HSU THE ALGORITHM IN A MACHlN'fi WHICH HAO AN UNLlMlVfcU Me 
HORY CAPACITY. THE COMHRCTlON HFTllCEM THE lUEA Uf- AN ALGORITHM AND THfc lUbA HF 
AN AUTOMATIC MICHINK WITH A MEMORY OF INIFNITE CAPACITY LfcAUS fO A CLEAR UimDHRST 
AKul.NG OF^THer-HATURr: 01= EACH. HOWEVER, J^OR ALL OF UOR EMPHASIS ON THhtP. CUNfifcCTI 
ON, WE STILL HAVE .NOT DEHKmEO bITHER OH THESE IDEAS PRECISFLY AN E^ACT ilATrlbl'.ATI 
CAL OEFINITON OF THE NflTIUK UF ALGORITHMS <AND, AT THE SAHE TlHEf OH AUTOI'iAflC C 
OMPUTING MACMlNfcSl HAS PRODUCED UNTIL THb l9iO»S. WHY, THROUGH fHl: COOXSk OF 

MAMY CENTURlESf HAVE MATHEMATICIAN'S TOLHRATED WITHOUT AfiY PARTICULAX UUAuMS AN 
UNCLEAR NOriON OF ALGORITHMS? WHY IS !T THAT U^ILY RECENTLY HAS AN ACUTE Nf'fcU HU 
R A OEFINITON SOFFICIENTLY EXACT FOR MATHEMATICAL DISCUSSION ARISEN?- EA;<LIERf T 
HE TERM "ALGORITHM" OCCUHREO IN MATHEMATICS ONLY IN COONhCTION WITH COhUkb Th ALG 
ORITHHS, WHERe AN ASSERTION OF THE EXiSfBNCE UF AH ALGORIVHM WAS ALWAYS ACCl*tHOA,>| 
lEO BY A OESCRIPTION OF 50CH AN ALG0RITM» UNUER THESE CONUiriONS II WaS mLCESS 
ARY TU SHOW ONLY THAT THE SY^ThM OF FORMAL INSTRUCTION ViHcN APi'tlbO Tij ANY OA f A 
IN HACT LEO AUTOMATICALLY TU THE 0ESIR60 RESOLl. THUS, IhE .^ifcEO fO'X A PR::CISL OE 
FINITION OF THE NUTIuN OF ALGORITHM NFVbR ARUSEt ALTHOUGH EVhRY MATHE.1A I IC I A M HA 
0 A ♦VORKlNG lOEA UF WHAT THE TERM MEAiMT. HOWEVER, IN THE COURSE OF MATHchAnCAi 

PROGRESSt FACTS BEGAN TO ACCUMULATE WHICH RADICALLY CHAi^GEO THE SITUAflUN. THC 

MOTIVE FORCE WAS THE N/TURAL DESIRE OF MATHEMATICI ANS fO CONSTRUCT iHCREASim.L 
Y POWERFUL ALGORITHMS FOR SOLVING INCREASINGLY GENERAL TYPES OF PROBLEMS. RtCA!. 
L THE ALGORITHM FOR FINDING SOOARE ROOTS. WE MIGHT WISH TO GEiMERALUE THIS PROH 
LEHJ TO CONSTROCT AN ALGORITHM FOR FINDING THE ROOT OF A."yY UEGREE OF ANY GlVlsN N 
UMHER. IT IS NATORAL TO EXPECT THAT SOCH AN ALGORITHM WILL Be MURE OIl^HIC'JLl TO 

CONSTRUCT, BUT THE PROSPECT OF HAVING IT IS ATTRACTIVE. Wb MAY GO EVli^M FoRlHER 
. FINDING THE NTH ROOT OF A NOMtiER A ME^NS SOLVING THE EOOATION X»«N-A«0 (HJhOIN 
G THE ROOTS OF THE EOUATtOM. Wt CAN FORMULATE THE STILL MuRb GENERAL HMJiiLbM: 
CONSTRUCT AN ALGORITHM FOR FINDING ALL HOOTS UF AN Y EOOATIUN OF THE HiiRH A(N)X* 
♦N+A<N-llX»*N-l*...^A<l|X+A(0)sO, WHERE N IS AN ARHIIRARY POSITIVE INTEW:R. THh 

CONSTRUCTION OF tOCH AK AH^RITI'M IS STILL MORE OIFFICOLT. IN FACT, iHE BASIC C 
ONTe^T-l^F THE THEORY OF EOUATION AMOUNTS TO THfc COi>*STROC TION UF JOST THIS ALOORI 
ThMJ IT IS OF THE GREATHST IMPORTANCE. THE fcXAMPLES GtVHN SHflW THE NATURAL STRIV 
ING OF MAfHEMATICIANS TO FIKO INCREASINGLY POWERFUL ALGORITHMS TO SOLVE INCRfcASl 
HGLY GENERAL TYPES OF PRPKLcMS. Uf CWRSfi , THE EXAMPLE OF SOLVING ALL bw»JATiONS 

OF THE FORM AlMlVf: DOES NOT RHPRESHNT THE LIMIT TO WHICH ONb MIGHT GO. IF Wh WA 
Nf TO PUSH THIS OESIRF FOR INCHfcASlNCLY GENHRAL AL GORIlHMS TO THE EXTHhMh Wb MU 
ST IMEVITAHLY CONSIDER THIS PROULEMt C«WSTRUCT AN ALGORITHM HUM SOLVINO ANY rtATH 
EMATICAL PROOLFM. THIS IS A PROULfcM SO GEMFRAL THAT IT MIGHT HE CONSIOhRbO AN I 
NSULENT CHALLENCE TO MATHEMATICS AS A WhlM.fi. HESIOES THiSt If CAN RE CAITICl/.EO 

UN TUT GROONDS fHAT IT IS NOT CLfc^R WHAT IS MEANT fiY "ANY MA fHEMA f ICAL FROIiLHM. 
" AT THE SAMH TIME, THE GREAT ALLURE OF SOLVING SOCM A PRUHLHM CANNOl Hb 000b TED 
. THIS PROBLeM HAS ITS OWN HIS^ORY^ fHL- GREAT GERMAN MA THIrMA T IC I AN ANO PHlLOSU 
PHt-R L6IHNIZ UM6-IL716) OREAKhO OF AN ALL-IOCLU SlVfi Ml: fHOU Ht.R SOLVING ANY PRO 
BLEM. ALTHOUGH HE WAS UNAOLf- TO HIND IT, LEU*/JU STILL iHUUGHf ThA f THh llMh Wt) 
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Uti) CO**h WHEN n I.'IHILO OlSr.OVtKM) » Am) TMAr ANY AK^-U.^ilNf ma hit :1A M(, I ANS 

C(.UL»> THF.^I Aur;M-'./»HCALI.V I'.!: ^HTILt'J 'MTll yUKl\. A,\!» P'iPI-K. Thr I'KU'h.L^• R 

i-Cl!V.:l> S'lMI- lUi lrlS;<H*'T IN n-b f-dHi nr- UtJh IMd tlMO t'AMIMjS MRiln. t.:s iiAI»«bW 

♦.ijtAL UKUCf ra*- Di.pnr !Mi.!TV PwiHLh.M. siwQ- nu i^iM HA^h rnh ku..«i m'.x a Cum 

p:.uTn TKhAT.^hNT n>- Tlth PRlilMJ/l, SHALl Ml«tLY SKl FCH IiS <»J-i\h!<AL ll'Mt'ijIb, AS 

is WLIL K.MUKrJf THt: /OtMrATIC MLfV-'lJ IN ^i MHl-.i.A 1 1 (,S Cl:NSl»;rh IN OtKlViNt, ALL fHb 
t\Ulr\S IfJ A GiVnN THfcUllY | v Mm,.\\L UHJlCAI. STLt»S HKMH CI«<IAIh AxI(K'«S hlllCH AtsV AC 
C»-PT»:» WITHOUT rKi)«)H» M}l:>f liH ALL AX 10 »JATIC Tnhl'KILS i'AS (.^fJHHTKYf ilU f IN 

/UOHKM .1ATMG.'ATICI> ALMOST ALL THt-fiRIKS AKH U)NS fHUC Tl« AX I T ICALLY . fiAriifchAt 
tC'./L LOGIC UMPl.:)Y$ A SPtXiAL *M.AMGllAOr: UF l-im.^OtAS" filAl l^vAHLhS US fU HK|r|. ANY 

»»KMPllS|TIlJN ur- A MATH»-MAf ILAL TJlhUKY AS A VNlgubl.Y f T'il ;U fJt!J »-oKMULA . n \\i\r 

TtRMlNULiH'.Y UMICH WH IJJHl) GMiLlhR TUK AN ASSuCIATlVH CALCULUS, MV /1AY sAY THAT 
SiJCH A FORMULA IS A HOlU) a SPhCIAL AL**HA-<t: F CONrAlNl.it, SYMbULS TU OMiJlh L(iGI 
CAL OPRKaTIOHS UICH AS hnGAfKlNf CUNJUNCTPHf ANi; IHPtlCA riU»^, AS WHLL AS Uit US 
UAL rtATHhMATICAL SYMOOLSt StJCH AS PARENl M*:S6Sf ANO l.tl fhRS TO OHNOffi HUNCTHuMS A 
N'O VAHlAbLtS. H(ll.fiVtRf THt CHlPF SIMILARITY Tu AN ASSNClAriVt CALCi^LUS CUnSISTS 

\H THE POSSIBILITY OF.KRiTlilO THt LOGICAL OFRIVATlON OF A SfATt-HCNf S FR(M A PR 
inlSH R IN THH FORM OF FORMAL TRANSFORMATION OF WOROS» VtKY SIMILAR TO THF AUMIS 
MbL6 S«ibSTt TUT IONS IN AN ASSOCIATIVE CALC«ILUS. THIS ALLOWS US TO SPtAK OF A LO 
GICAL CALCOLUSf WITH A JVSTtrh OF AOMISSIOLE TRANSFORMATIONS RCPRtSfiNT l.^ti HLhMtNT 
ARY ACTS OF LOGICAL OEOUCTIONf FROM WMlCH AMY LOGICAL iNU'RhNCEf OF AKtUTRARY CO 
MPLL-XITYt HAY H£ hUlLT. AN KXAMPLt OF SuCH AN AUNISSlBLt TKANSF(JRMAT10N IS THH 
FLIMIfxATION OF l\iO CUNStCUTlVi: NRGATIO NS IN A FORMULA; TMUSt "NOf ONPKUVbU« MAY 

Bt THANSFORMi-n INTO "PROVtO." THE OUeSTIO.4 OF THF LOGICAL OFOOCHilLlTY OF THt P 
R0P0SITIU«*J S FRUH THE PRtMIS'i R IN A LOGICAL CALC(JLOS ttECUMtS THF OUtSTION OF TH 
R EXISTtNCe OF A OEOUCTJVH CHAIN LKAOINti FR(JM IhE WimO RtPRtStNT INO R TO THB VIOR 
0 RFPRtStNTING S. THE OROUCIHILITY PROBLtM MAY NOW HE FORMULA TCO AS FULLCJWS: FO 
R ANY TVX) WOAOS (FORMULAS) R ANIJ S IN A LOGICAL CALCOLOS, OETtftMINh WhHUlEK OR N 
OT TMtRE EXISTS A 0E00CTIV6 CHAIN FROM « TO S« THE SOLOTMJN IS SUPPOSKO TO iJt A 
N ALGORITHM FOR SOLVING ANY PROBLEM OF THIS lYPt (ANY R ANO S).-^ SLiCH AN AHiURlT 
HM WOULD GIVE A GENERAL METHOD FOR SOLVING PROBLtMS IN ALL hATHt^AflCAL rHIiURltS 

WHICH ARE CONSTRUCTeO *X lOMAT I CALLV lOR RATHtRt IN ALL FlNlULV AX lO.HA r 1 7ABH: T 
HUORIES)» THE VALIDITY OF A«Y STATEMENT S IN SUCH A ThFORV MnRCLY MEANS THAT IT 

CAN BE UFOUCEO FROM ThE SYSTEM OF AXIOMS* OR WHAT IS THE SAMt THING, THAT IT CA 
N B6 DEDOCED FROM THE STATEMENT R WHICH ASSI:RTS THAT ALL THt: AXIO>^S HULO. THlrN T 
HE APPLICATION OF THE ALCORITHM WOULD OtTtRMlNt WHETHtR OR NOT THt PROPUSUIUN S 

Wl:Rt: VALID. MOREO VER, IF THt; PROPOSITION S WER6 VALIOt THbN WE COOLO F inu A CO 
RRESPONOlNG DFOOCTIVt CHAIN OF INFERENCE WHICH WOULD «»ROVI: THE PROPOSITION. THE 
PPOPOSED ALGORITHMS WOOLO IN FACT BE A SINGLE bFFECTIVt: HH fHOO FOR SOLVING ALHOS 
T ALL OF THt MATHEMATICAL PROOLEMS WHICH HAVE BEEN HlRHOLATtO AND REMAIN ONSOLVE 
0 TO THIS DAY. THAT IS WHY CONSTROCTING SOCH AN "ALL-INCLOSI Vt ALGORIlHM« ANO A 
N HQHNIPOTbNT MACHINE"- TO MATCH IT IS SO APPtALtNG A PROSPtCT AND AT THE SAHE T 
IME SO DIFFICOLT. FINDING SUCH AN ALGORITHM HAVt REHAINEO INSURMOUNTABLE. FORTH 
ERMOREt SIMILAR 01 FFtCULTIES WERE SOON ENCOONTtRED IN IRVING TO FINO ALGORITHMS 

FOR CERTAIN PROBLEMS OF A FAR LESS GENERAL NATURE. AMONG fHHSE WERh hlLBt:RT*S 
PRO»LEM ON OIOPHA^NTINE EOOATtONSt AS HELL AS OTHERS WHICH WILL BE OlSCUSSeO RHL 
OM. AS A RESULT OF MANY FROITLhSS ATTEMPTS TO CONSTROCT SUCH ALGORITHMS, If BEC 
AME CLEAR THAT THE DIFFICULTIES INVOLVED ARE BASIC* ANO IT CAME TO BE SOSPECTtD 
THAT IT IS NOT POSSIBLE TO CONSTROCT AN ALGORITHM FOR EVERY CLASS OF PHUnLtMS. 
THE ASSERTION THAT A ChRTAIN CLASS OF PROBLEMS CANNOT BE SOLVED ALOORITHMICALLY 
IS NOT SIMPLY A STATEMENT THAT NO ALGOHTHH HAS YET BEEN OlSCOVERHO. IT U THE 
STATEMENT THAT SUCH AN ALGORITHM IN FACT CAN NEVER HE OISCOVERHO, IN OFHFR WORDS 
, THAT NO SUCH AN ALGORITHM CAN EXIST. fMIS ASSERTION MUST HE BASED Oi^ SOME SOR 
T OF MATHEMATICAL PROOFJ HUWHVFR» iOCH A PP.OOF MAKES NO SENSE UNTIL WE HAVl: A PR 
eClSE OCFIi^aTION OF "ALGORITHM", SINCE ONTIL THF.N IT IS NtiT CLEAR WHAT IT IS WH 
ARE TRYING TO PRCIVH |MP(.JS|l*Lh. IT IS 0S6F0L TO REMHMflhR Af THIS POINT THAT IN 
THE HISTORY OF MATHEMATICS THERE HAVE 8tEN OTHER PRtmLEMS FtJIl WHICH SOLO T IONS HA 
O BEEN SUt'GHT IN VAlN FOR A LONG TIM?, ANO FOR WHICH IT WAS fINLY LATER PRUVEO TH 
AT SriLllfinNS COOLO NOT BF OBTAff-'l-O. HXAMPtFS ARE 1HF PRimtHM OF TRISFCTINO fht 
AN'GLE AND 1 Hfi PROKliM OF SOLVING THE G^KRAL FIFlH OLGRFh EUOAflO N BY RADICALS. 

A METHOn OF BISECTING AN ANGLF OSING COMPASS ANO SiRMWlTLObt IS KNOWN til hVER 
Y SCHOOLBOY. THE ANCIFM GREEKS TRIED 10 SrjLVt "*HE PflJiBLtM OF TR I SIC T liMG AN ANG 
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Ih USI''» r<Mi>A.SS AM) ST-:/.|f>JlTIOOI-. JT wAS 1.4 fcK PKi^Vfi) IMA r rRlSiOf'U.N '.'H A.^l /.J< 
|^n•l'^Ky viCH 'li'it l\ |.-«Pr'S<^P" . M IS Al Su ".rX\. K.naU r'«Ai iMh bLM^ 

I.I.I I'h A iilAH!w*ni: k»"»^MJ • CAN ..';ffi>-.J !M r^.OS '.F IM<: l.i't J^l ICP :ib i>Y ML*nS 

< AL J^KmU fMh^F Mf r-MKJM'LAS M ^-VMC^Lbt .^hHIM aWI- r.;^ IHi.'^I-L Y MjiIHI |;.A I fl) ♦ 

Hi> T"»JH0 AM) rMiRftl !. AT II iM< . A <!:AK«'.h i-tH <;iilL»x K S'UUMiSNS bY KAOl 

CALS t-JV. *^i;llAT H-'^^IS OH r.'<l-ji MOHJ^R IHiiJ H'HlK IJAS r,.*\-<IHO UM^^t t.CFSSrUl.l.Y liMfl 
L Thi: i;i;Mlm'» I'H THH ^ «M:Tf:HUTH C! ;«Tli! Yf WMtrN iHh s-OI.LH.tlJiO Rl•..A^X A(il.t KfcSUI.1 
jJAS HIi^'ALr/ f:>TA»iLlbHfO. FOR AMY '^l btV ATfe^^ fMAJ i- iiJ HiVrt II IS |.«POSSI 

bLK TiJ I Xt'lU-bS rht K»rtT> I'F TH« l.ir-frH'L i^'TH UKGRhh Hm»aMi).< IN FHH 'S tM- MS CnhH 
FIOIfiiNTS KY V»^.''lb OF Tm»- /.R I Thm^^T IC il»';'< AThh^S T^r •,»'f :<A riU^ C:»- i:a )KA»M |i>i(t R 

DOTS. IN HUTH ThF$h C/il^S THb PMUHF •> J .MPOSSMP. MY H'KNtU CUf U) 0? ItAbliJt.l- 
O.SLY AFTBrt The»^H W6U PH^Clhl: ObF 1^' I f h^^J S ANV.l»« ThI: UDt^S TI iJNS "-ihA f IS I't-AfJf 

I3Y A CU4PAiS bTH/l U^'T tOR CONSTHmCT KVI?" A.NU •NHAT IS /^L=ANT OY bia VINu AiJ F 

OUATIO."^ n RADICAIS?" MlTI? THAf THFSF TriO Pi-F l»>« I TttJ^'S GAVfc A mi^h PRKC ISH .^»-A.mIM 
G TO CF^tTAlM SPIXIAL ALd'HITHMSt M/Ki^LYt fHK AUiU*t"iHM F(M SULVlNft AN htJUATlU-t iiJ 

RADICALS (NOT FUR THF Mil.UllUN OF FOiUflOi^S IN Gt-Nfcftrtl.) AND THl- ALr,fj,< I fHM HUH T 
RliECTING A^J AMGLE HlTH COMPASS tm ST« IliMTKUGC <N(;T Fl'rt fKISriCTiWG HY AKBIfKAR 
Y DEVlCbSK U^'TIL RBCl-nTLY, THt:ft€ WAS NO PRfcCISG UhFliMl flOrj uF THE CUiMCFPT "ALO 
OR|THM« AMD THFKEFORE THb CONSTRUCTHH OF SuCH A OKFINHIUN CAM6 TU Bt UNb OF TH 
e MAJOR P»^OnlbMS OF MOJ)»:R»^l HATHFMATICS. IT IS VbRV llPuH TA<>iT T*» PnlMf COT THAT 
THb FORMULATION UF A OHMNlTION OF "ALGORITHM" tOR OF AivY OTHbR »»ATHh.-A flCAL OFF 
INIT|ON> MUST ttE CONSIOfcRFO NOT HbRELY AN ARHl TRARY AG*< Khfb.j T AhuVG nATh».HAT|CIA 
NS AS TU WHAT THE .MEA^)l^t; OF THF hORO "ALGORI TH"" SHOOLO iM=. THb OHHiJlflUN HAS 

TO RbPLFCT ACCURATELY THf: SuBSTANCb 'JF THOSt lObAS WHICH ARE ACTUALLY HFLU, HOW 
FVbR VAGuFLYf AND WHICH HAVF ALREAt)Y HFbN 1LLUSTKAT»:D flY MANY bXAyPLbS.-* WITH TH 
tS AIMf A SERIFS OF INV6ST IGAT IONS WAS UNOHRTAKbN B-UINNING IN THb 1930»S FUR CM 
ARACTERUING ALL THE .^FIPODS WHICH WbRF ACTUALLY USbO IN CUJJS TRUC T ING ALGURI THhS 
. THE PROBLEM WAS TO FORMULATE A ObFINITlON OF THE CUHCEPT OF ALGORITHM WHICH W 
OULO ttF COMPLETE NOT ONLY IN FORM, 6tJTf nORh I^PORTAJUt IN SUBSTAMCK. VARIOUS W 
0RK6RS PRPCE60F0 FROM OlFFgRENT LOGICAL STARTING POlHTSt A.MO BFCAUSF OF THIS, SE 
VERAL DEFINITIONS WERH PROPOSED. HOHbVbRt IT TURWFD UUT THAT ALL OF THFSE WERE 
EQUIVALENT, ANO THEY ObFINbO THF StMb CONCEPT; THIS HAS )Hh MgOFRN OHFINIflO^J OF 

ALGORITHM. THE FACT ALL OF THESE APPARENTLY UIFFEatMT OEF|^J|TIO^'S WbRI; REALLY 
ESSENTIALLY THE SAME IS dUTE SIGNIFICANT! IT INOICATES THAT wE HAVE A WORTHWhIL 
E DEFINITION. FROM THE POINT OF VIEW OF MACHINE HAlHElAllCS, WE ARE ESPbCULLY I 
NTERFSTbO IN THE FORM OF THE OFF INlTION WHICH PROCEF')S FROM A CONS lOEKA T ION OF 
THE PROCESSES PERFORMAfcLE tiY MACHINES. FOR SUCH A RIGUROUS MATHEJ*ATICAL OEFlNiT! 
ON IT IS K'fCESSARY TO REPRESENT THE OPERATION OF THE MACHINE IN THE FORM OF SOME 

SUNDARO SCHEME, WHICH HAS AS SIMPLE A LOGICAL STRl»CTURb AS POSSIBLE, GOT WHICH 

IS SUFFICIENTLY PRECISE FOR USE IN MATHEMATICAL INVESTIGATIONS. THIS WAS FIRST 

DONE »Y THE ENGLISH MATHEMATICIAN TURING, WHO PROPffSbO A VbRY GENERAL BUT VfcRY 
SIMPLE CONCEPTION OF A COMPUTING MACHINE. IT SHOUl 0 rtE NOTED THAT THE TURINii MA 
CHINE '<AS FIRST DESCRIFEt* IN 1937, THAT IS, BEFORE THb CONSTRUCTION OF HOObRN CO 
MPUTING MACHINES. TURING PROCEEDED SIMPLY ON THE GENirRAL lUEA OF EOUATINO THE 0 
PUATION OF A MACHIN E TO THb WORK OF A WiMAN CALCULATOR WHD IS FOLLOWING OEFINiT 
E INSTRUCTIONS. OUR PRESMTATION OF HIS lOEAS WILL UTILIZE THb GbNERAL IDEAS UF 

ELECTRONIC MACHINES NOV IN USE. 
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THE tirn MA.-t r.s'' Tl-»f b»=A i*^ . f'l-.lI^Gt-'AY'* ..... ^. 

Hf 'iAb Arr tlLO PAH Whll ^UlMh IN' A SKIFF IN TtiP GUth S IHtAM AMI he HAW i,«.M- 

»:'|i;MTY-H»Mm ilAV^ fMIW TAKIH. A h I MU IM fHH FluST FU'MY OAYS A r.MY HAP n 

Ft^M MlTH H|l». r»lT AhTF« l=»iKfV nAVS» MT^IlUr A HISM 1MH IH«Y«S I*aKI:HIS IIAti TULU ItIM 

THAT TMfc OLtl HArt .MS H tM.|:|»MTlLV AfM) M-^MLLY SALMlt '..IIICM IS TMI- idiKS f HUH.^i 
fiF UMLtlCKV* AND THE HlV IM» «»'rM: Af TH»=|rt PUia.KS ir^ AuMtllJ-.K MljA f V'MlCll CAUGHT TH 
PFH nu«l!) FIbH fHt FIKST \ tkr.. IT M/Uh Thi^ »MlY SAD TM shr- IHt t.L«» ^5aH CO..* U< fcAC 
h'oAV HITH his skiff FMPTV rH» Hb ALhAVS t!F.if OUV.'N TO Hl:LH hi.-* ( Aimv tIfHtK fHl- 
P£HMA^*fcm UCFHAT. THH HU) «'AS TM|W A^!^> (»Atr»l t.'IlM l/hrP 'lUrKl.FS l»: rnU ^lACC 

(IF MIS l«FCK. THt BRtlHN «-LOTCHf S llF TMF He^»HVULF.VT SKM CA^JCTH TH SUN 'MUNUS FU 
OH ITS nfchLkCTIIW HM THt TKMPIC SFA »^FR6 «K« HIS CI|b«--KS, fHH HLHTCHHS UAs- rtiUL Ott 
•■*.N THI: SlOkS OF MIS FACt ArtU MIS MANIAS MAU TMt Hhfil*-C<HAShll SCARS HHUN HA.40LIM; 
^♦tAVY FISH ON TMF COKflJ. hUT OF THfcSI; SCARS Hn»:H FKhSM. THHY ■«hRfc AS OLO AS 

6K0MONS I« A FISHL6SS OFSEHT. 6V6HVTMING AttOOT MIrt wAS ULU hXCFPT HIS UVhS ANO 
TI-EV NCRK TH6 SAHC COU'R AS TH^ SKA AWU HPRE CMKC^FOL ANO U-^U6FeATHl). 
.<ANTIA00» 'tM6 IMIV SAI0 TO Hir AS TMgY CLIM^^FU TMb BA»«K F.OM VMHR6 TMH ^^♦^if'*' J'^f . 
MAULEO U^t I COULD CO HTH YOU AG^IN. MHVF HAOE SOME HUSHY, fMF (JLO HAN HAD TAU 
CMT TME fcOY TO FISH AWO THE HOY LOVED HIM, nfO, THE OLO HAN SAIf>, YUUKb WITH A 
LUCKY »OAT. STAY WITM jyfM. HOT REMF«*»Eft MOW YOU HENf ElOMTY-SEVEM UAYS WiTltt'HT 
FISM TMEM HE ciuCHT HIG O^JgS EVCRY OAY FOR THREE viOtKS, I REHEMBtK, Thfc (iL 

0 MAM SAID. I KNOW YOU OlO NOT LEAVf: HE ftl^CAUSi: YCJU OMOUTHU. IT ^'^S J'A^A NAUb N 
F LEAVF. I AM A HOY AHO I MUST OBEY HIM, I KNUM t THE ULO ;!Al^ SAID, |T IS OOITG N 
(IRHAL. ME MASNT MUCH FAITM, n0» THE OLD HAN SAID, BUT WE MAVF, MAV6WT Wh? YESt T 
HF BOY SAlo! CAN I OFFER YOU A BEER ON TH6 TfiRRACg AW) TMEH MELL TAKE THE STUFF 
HOME. HMY NOT? THE OLO HAN SAID, BETMFE^ FISHfcHHEM, TMHV SAT ON TM6 TtHRACb AND 
MANY UF TME FISMEHMEN MADE FOM OF TMH OLO HAN AND MF WAS NOT ANORY. OTH:RSt UF T 
H€ OCOFR FiSMERMENt LOOKED AT HIM ANI) W6RF SAO, HUT THEY UlU NOT SHOW T AND THE 
Y SPOKF POLITELY AfOUT THE CURRENT AMD TMg DEPTHS THEY HAD URIFTFO THEIR LlHfcS A 
T AMD THE STEADY GOOD WEATHER AND OF WHAT TM6Y HAD SEEN, THE SUCCESSFUL F ISHcRrtE 
H UF THAT DAY WERE ALREADY IN AMD HAD RUTCHEREO THEIR MARLIN OUT AhO CARHIEO THE 
H LAID FULL LENGTH ACROiS TWU PLAKKSt WITH TWll ME^ STAGGERING AT THE ENO OF kACH 
PLANK, TO THE FISH HOtlSE WHFRE THFY WAITED FOR THE ICF TRUCK TO CAHRY THgM fl) T 
HE MARKET IN HAVANA, THOSE WHO MAO CA»jGHf SHARKS MAD TAKEN THEH TO THE SHARK FAC 
TORY ON THE OTHER SIDE OF THE COVE WHERE THEY WERE HfllSTtO Ot» A RLOCK ANO TACKLE 
. THEIR LIVERS REMOVED, THEIR FIMS CUT OFF ANO THEIR HIDES SKINNED OUT ANO THEIR 
FLESH CUT INTO STRIPS FOR SALTIKC. HMEN THir WIND WAS IN THE EAST A SMELL CAHE A 
CRUSS THE HARWR FROM THE SHARK FACTORY! BUT TODAY THERE WAS ONLY THE FAINT EDGE 
OF THE ODOR BEWUSE THE HIND HAD HACKED INTO THE NORTH AND THFH OROPPf-O OF AN 
D IT WAS PLEASANT ANO SUM^Y ON THE TFRRACE. SANTlACOt THE ROY SAID, YESt THE OLO 
HAN SAID, HE WAS HOLDING MIS GLASS A^O THINKING OF MANY YEARS AGO. CAN I CO OUT 
TO GET SARDINES FOR YDO FOR TOHORROH? NO. GO ANO PLAY BASEBALL. I CAN STILL R0« 
AMD RDGELIO WILL THROW THE NET. I WOULD LIKE TO GO. IF I CANNOT FISH WITH YOU, 
I WOULD LIKE. TO SERVE IN SOME HAY. YOU BOUGHT ME A BEER; TMb ULO MAfi SAID. VUU A 
RE ALREADY A HAN. HOW OLD WAS I WHEN YOU FIRST TOOK ME |N A BOAT? FIVE A.*0 YUO N 
EARLY WERF KILLED WHEN I BROUGHT THE FI$H IN TOO GREEN AHO HE NEARLY TORE THb BO 
AT TO PIECES. CAN YOU REMEMBER? I CAN REMEMBER THE TAIL SLAPPING AK'D BANUlNO AND 

THE THWART BREAKING AI^O THE NOISE OF THE CLUBBING. I CArl REMEMBER YUU THROhIH; 
ME INTO THE l-OW WHERE THE WET COILED LINES WERE A^IO FEELING THE WHOLE BOAT SOlVE 
R AND THE NOISE OF YOU CLOhbING HIM LIKE CHOPPING A TREE DOWN AND THE SHEET BLOt) 

0 SHELL ALL OVER ME. CAN YOO REALLY REMEMBER THAT OR 010 I JUST TELL IT TO YUtJ? 

1 REHEHBE** EVERYTH|«JG FfOM WHEN HE FIRST WENT T0GE1MEK. THE OLD MAN LOUKFO AT Ml 
WITH MIS SONIRIRNEO, COHFll.CNr LOVING EYES. IF YOO WtRt: /lY HOY III TAKE YOU UUf 

ANU GAHBCFt HE SAID. BOl YOU ARE YOUR FATHERS ANO YOUR OOTHbKS ANM) Y(«t« ARE Ifl A 
LUCKY BOAT. NAY I GET THE SAHOIHES? I KNOW "HFRF I CAH GHT l-OOK RAlTS TO'l. I HAV 
E NINE LEFT FROM T4W)AY. I PUT THEM |ri SALT IN 1SE **OX. LE f HF G*:T FOUK HRbSl' Ui^T: 
ONEi THF MLO MAN 5AlO. HIS HOPE AMD HIS CUNFIOFNCE HAO M-.VhR GO^IF. HOT HU»< IH 
FY WERE F*<ESHEM^G AS t»^F*>» THE HREEZF RISES. T'lO, 1HF Rl'Y SAID. TWOt THE OLO 

AGRFUn. Y*iu OlONT STEAL TOFf»? I ♦^»«IL»S THF IJOY SA|0. HUT I HIHlGHf fllbSh. ThA^iK 
YOilt THF nill itAN i/»IO. HF WAS TCHJ MM»>LF TO wn-^ObR ♦^HfcH Mb HMI ATTAIMO HOllLIfy 
. KIT Hb KNH^ HE HAO ATT^lMfct* IT AM» HE Ki'F" |T WAS NhT OlSORACFF^IL Ai<0 If CAkHl 
F) NO LOSS OF TRUE PRIOfc. TOUOR**^'^ IS GOIriG TO A Gl.Oti HAY WITH THIS CORKbNf t 
HF SAIO. VHERF ARF YilU GiMnG'/ THF KhV ASKEO. FAR OOT HI CllM»- |h» t'HFN fht wiNU SO 
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inrs*. I HAHT TO HI: liUT IfFdrtl- IT l^ Lir.|»T. ILL TKY lf» HPr HIm m hiIKK FAH UUff T 
HfJ lv»Y S^'IO. TI^N IH YfMi lOJ'iK Sf^'hTHlWi; IHUI.Y |(|G Mf: CA,M C«»H»- TP YhHK AID* Hfc lUi 
f:S MfiT LIKK T!> MflHK TMt l AK mO • Mil, T«m 'VtY SaIO, i\Hf I m|U SCI- !>t''»t fHlrHi fMAI 

Ht CA^-IOT SUCH AS A HlHl» wnKKlWO ArM> MIh Th Clf-»l- tluf AMrn imLl'Hl^^, A«-i 

MIS tYES THAT. I**!)? HC ^^ ALMIST laiMI, IT IS STKAMiF, llth iiLO MA<*J SAU>, Ml: NhVf: 
R HHHT TURTL^-I^•li. THAT I* WHAT KILLS THh KYtS# HUT Y«UI '.H'^ TimTLH-LMi fn:< YhAH 
S OFF TK MIlSfiillTH CUAM AH) YOim l-YlfS AWH \ A STKA^'GH OHl <AN. fiilT Akl- 

YlIU STROMG kmtiGH f^flM HnK A THULY H|A FISH? I THtA*K $!>• A^U THtHF AKt MAflY TKIC 
K5« LIfT US TAKH THf: STUFF Httr'.F. THf: HflY SAID. SO I CAN I.^T TPfe CAST (4^1 A^O Oil A 
FT6»l Tt»€ SAHIUfHS. THEY HiCKhU UP THH OfiAH FKUH THF miAT. T«F OLO MA'^I CARKltiO TH 
E HAST CM HIS SMOULOKR tfU )Hfc l^f.lY CA'^RIFO T;* WU«mEN liOX •♦ITH T^'l: CiJlLMlf HARD- 
{'■^AIOEO WMH LINESt THE r.AFF AH) THF HAKPtHiN WITH ITS SHAFT. THE AfU tilTH THh H 
AITS WAS UNOE^ THE STFRH tiF THF SKIFF ALdHr. uIlH THE CLim THAT WAS I'SKU TO SuHUM 
E THE PIC FISH HHEM THf.Y HERE BROUGHT ALOHGSIUE. t^) ONE WUULn STEAL I'RO.^ THE (rt.n 

MAN bUT IT WAS 6ETTER 10 TARE THE SAIL ANf) THE HEAVY LIMES HOME AS THE HbW WAS 
fAO FOR THEM A^^>t THCHlGH Mf WAS WIITE SURE Ml LOCAL PEOPLE WOULD STEAL FKUfi Hlf«t 

THE OLD HAN THOUGHT THAT A ftAFF AND A HARPflON WERE NEEDLESS TEHPTATIOf*^ TO LEAV 
E IN A BOAT. THEY WALKED OP THE ROAD TOGETHER TO THE OLD HANS SHACK Aht WENT IN 
THROUGH ITS OPEN DOUR. THE OLD HAH LEANED THE HAST WITH ITS WRAPPED SAIL AGAINST 

THE WALL AND THE BOY PUT THE BOX AND THE OTHER GEAR DESIOE IT. THE HAST WAS NEA 
RLV AS LONG AS THE ONE MM)H OF THE SHACK. THE SHACK WAS HAOE OF THE TUJGH HOU~SH 
lELDS OF THE ROYAL PALH WHICH ARE CALLED G4IAN0 AHi) |M IT THERE WAS A BEOt A TABL 
Et ONE CHAtRff AND A PLACE ON THE DIRT FLOOR TO COOK WITH CHARCOAL. THE dROWN 
WALLS OF THE FLATTENEDt OVERLAPPtNG LEAVES OF THE STURDY FIBEKEO GUANO THERE WAS 

A PICTURE IN COLOR OF THE SACRED HEART flF JESUS A^-0 ANCTHEK OF THE VIRGIN OF CO 
BRE. THESE WERE RELICS OF HIS WIFE. t)NCE THERE HAD REEN A TINTED PHOruURApH OF 
HIS WIFE ON THE WAU BUT HE HAD TAKEN IT DOWN BECAUSE IT HADE HIH TOO LfK^ELY TO 
SEE IT AND IT WAS ON THE SHELF IN THE CORNER UNDER HIS CLEAN SHIRT. WHAT 00 YOU 
HAVE TO EAT? THE BOY A$KED. A POT OP YELLOW RICE WITH FISH. 00 YOU WANT SOME? NO 

• t Mill EAT AT HONE. 00 YOU WANT HE TO HAKE THE PIRE? NO. I WILL MAKE IT LATER 
ON. OR I NAY EAT THE RICE COLD. ^AY I TARE THE CAST HgT? OF COURSE. THERE WAS NO 

CAST NET AND TH€ BOY REHENHERED WHEN THEY HAD SOLO IT. BUT TWEV WENT THUOUGH TH 
IS PICTIDN EVERY DAY* THERE WAS NO POT OP YELLOW RICE AND PISH AND THh'' BOY KNEW 
THIS TOO. EIGHTV-PIVE IS A LUCKY NUMBERt THE OLD MAN SAID. HOW WOULD YOU LIKE TO 

SEE ME BRING ONE IN THAT DRESSED OUT OVER A IHOUSAND POUNDS? ILL GET THE CASE 
^iET AND GO POR SAROINES. WILL YOU SIT IN THF SUN IN THE DOORWAY? YES. I HAVE YES 
TEROAVS PAPER AND I WILL READ THE BASERALL. THE BOY DID NOT KNOW WHETHER YESTERD 
AYS PAPER HAS A PICTION TOO. BUT THE OLD MAN BROUGHT IT OUT FROM UNDER THE BED. 
^ WITH HIS SUNRURNEOff CtNFIDENT LOVING EVES. 1^ YOU WERE MY ROY ID TAKE VUU OliT 
AND GAMtlLEt HE SAID. RUT YUl ARE YOUR FATHERS AND VOlfR MOTHERS AN» yOU ARE IN A 
LUCKY ROAT. MAY I GET TK SARDINES? I KNOW WHERE I CAN GET POUR RAITIS TCK). I HAV 
E MINE LEFT FROM TODAY. I PUT THEM IN SALT IN THE BOX. LET ME GET POUK FRESH ONE 
S. ONEt THE OLD MAN SAID. HIS HOPE AND H|$ CONFIDENCE HAD NEVER GONE. BUT NON TH 
EY WERE FRESHENING AS HHEN THE BREEZE RISES. TWOt THE BOY SAID. TWOt THE OLD MAN 

AGREED. YOU DIONT STEAL THEM? I WOOLOt THE BOY SAID. BUT I ROUGHT THESE. THANK 
VOUt THE OLD MAN SAID. HE HAS TOO SIMPLE TO WONDER WHEN HE HAD ATTAINED HUnICITY 

• BUT HE KNEW HE HAD A1TAINED IT AND HE KNEW IT WAS NUT DISGRACEFUL AND ST CARRI 
FD NO LOSS OF TRUE PRIOF. TUrORROW IS GOING TO BE A GOOD DAY WITH THIS CORKENTt 
WF SAID. WHERE ARE YOU GOING? THE BOY ASKED. PAR nuT TO COME IN WHEN THE WIND SH 
IFTS. I WANT TO BE (HJT BEFORE IT IS LIGHT. ILL TRY TO GET HIM TO WORK i^AR UUT» T 
HE BOY SAID. THEN IF YOU HOOK SOMETH 

ES NOT LIKE TO WORK TOO FAR OUT. NOt THE BOY SAID. ROT I WILL SEE SOMETHING THAT 
HE CAWNOT SEE SifCH AS A BIRD »^RK|NG AMI GET HIM TO COME OUT AFTER DULPHlN. ARE 
HIS EYES THAT BAD? HE IS ALhOST VLINO. IT IS STRANGFf THE OLD MAN SAID. HE NEVE 
« WE*iT TURTiE-ING« THAT IS nPAT KILLS ThF FYES. BUT YfiO WENT TURTLE-ING FUR YEAR 
5 UFF THE MOSmtlTO COAST ANh YiHiR EYFS ARE GilUO. I AM A SfRANtiE IH.0 HUT ARE 

YIMI STRONG KHtllipH MIM fMt A TRULY MIC FISH? I THINK SO. AND THERE AMI; MANY TRIC 
<S. LET US TAKE THt STUFF H»mEf THF aOY SAIO. SO I CAN GEf THE CAST NbT ANU GO A 
FTfcR THE SAROpiFS. THF*Y PICKI-ft OP THE n»i/^R FKOM THE RUAl. Iti^ fH,0 HA«>i CAKKlfcU TH 
^ flAST ON HIS SHIHILOER AHl IMF lil«Y CA'J.IIFO THF WIH^OEM HOX WITH THK COIL»t<I, HARD- 
•«RAI*»tn bR»l»<N LINfSf TfE GAl-F AN'i> TMU HARP'WIN 4ITH ITS SHAFT. THE lUiX W I f H Thfi R 
rllS HAS UriOER THE STFRH OF Tim SKIFF ALONG WITH THE CLtm THAT WAS UShI) TO SUHIMI 
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^ j^pi fflQ HI^H ^Hf:^ TH^V *'»:Mh !>m4ir*HT Atrif4r*S|U£, Ml I1HI: UlMitO STHM* FKU*^ TmI: DU* 

IJMT IT hFTTHn TO Tl»rH THF SAlt AH'> Thh H»-Avy tlWtS HtJMF Thfc Hl:rJ wAS 

il/\»> THfcrt A^O, THiHlG>> VhS <HIITl- Sn«h ♦"U U»C^I. MMlHtk uMULO ^T»-AL e»<U"i h|ht 

THt bf) MAH T»OUr.HT THAT A c;AFF AH) A HAK^fH>N t^tr«*f H^k^l,^:sS Th ^MTAllUNS Tl) LMV 
b V' A N|AT» TM6V i^ALKCII »»H THr H(»fO T»IC»€THHH )M TM»: m.n K.ArtS SM^CK Aftt) %lfc«T Iw 
THK']»»CM ITS O^hH OIMH, ThP IHO f»AH LfAMFO TMK iAST wiTM ITS ;*kAf»>co SAIL Ai;Al*»Sr 

TH- «AU A«0 TH€ H)V PUT 1^^ l*f»X A«^M THr MTHhK i»»r^K -IhSIOh IT. THH HAST »v|.A 
HLV fS LOHG AS THH fmi- RO'lh IJI^ Tim SHACK. THK SMACX ''AS lAMc OF IMF TuUliH iUID-SH 
l»:L»iS PF TH6 i^OVAL ••/^tH ^^HCH Af»«: C^LLFU GMAWO aaO Im 11 fHt^f: A Hfil), A TArtl. 

€f CHAIKf /HO A l»LAC6 tl."4 THK IHWT FUMI^ TO CCr^K TM CHAttCUAL. OW TH»: hKUhrl 

HALLS nF THC FLA1TfcWt*f>f nVl-KLAI»PlfU. LFAVFS f«l- THtr STifHOV Flit^^KIrO GMAnU THFHI: ♦^AS 

A fICTllKF IW COL<»<« OF THF SACHFII KA^^T l»F JfiSUS AM*rHF:< iiF IMF ViKUlH UF CU 

HUE. TM*rSl: tinH*- HFLICS f«F HIS v»IFF. IWCH THF*fe HAi» ^PhH A TlHfhll PHt1T00>^AHH <IF 
HIS **IFe ON THI? HALt Hit hA() TAK(EN IT Otl^f4 fIFCAttSe IT HAUf: Kf^ T(H) LLHk'LV Ul 
SP6 IT AWO IT MAS rt»f TH6 SHHLF IM THI? CDdMrH UlDE^ M| S CLiAw SHIRT. WHAT OU VUU 
HAV5 TO CAT? THE eOV ASRIS*^ A PtIT DF VlrLLO^ KICI: ^ITM FISH. DO YOU l*ANT S»»^il:V K« 
• I *MtL nhl AT H(tNE. r>C* YOU WA^T Kg TO HAKS THH PIMI?? M). I HILL HAKl: IT LAT^R 
fW. <» I HAY SAT THC RtCI! C4H.0. KAY I TAK£ THE CAST «^ET7 OF ClAIRSF. THERE HAS M) 

CAST AND THE COY RCHEHttEHEO HHE*i TKY HAD SOLO I T. dUT THEY HE«^T THROUGH TH 
IS FICTIUH EVEHY DAY. THERE HAS fiO POT OP YELLUH RICE ANU FISH AH) THE HHY KHFH 
THIS TOO. EIGHTY«^IVE IS A LtlCKV HUHR^Rt THF OLD HAN SAID. HO|s* HOULO YJU LIKfe: TO 

SEE HE MIIHC OHE IH THAT ORESSEO OUT OVER A TMOUSAHU PIMJNUS? ILL f.ET THb CASE 
NET AMI CO FOR SAROINES* H|U YOU SIT IN THE SUM IH TH^ OUOKHAY? YES. I HAVk YES 
TEROAYS PAPER ANO I HiLt REAO THE SASEMLL. THE ttOY 010 HOT KHOH KHETKkH YESTEHO 



TMC CLAVtCHHRO *n6 hum TO PLAY IT. KMARCFRV MALFORO, CLAVI 6»K» J ? 1 , »197P). 

THe CLAVICHflRO I S OMi: OF THt .MOST SeNStTIVt ANO bXPRESSIVl- viictr*! 
IWTRO«t«TS, OESPUK Tl« FACT THAT IIS €^51^110715 CXT^itLV s!i"t. 

6SS6NTIAUV, THE CLAVIC(«)RU IS A SHALLOM ReCTAhf-ULAK BOX HHOSF FMf ILE STRI 

IN SUOWOWIARO. THR KEVi ARE SIMPLE LEVERS UlTH A HRASS HLAUE CALLLI) A lAhUENT 
HUONTEO VERTICALLY UN THE FAR END. NHfit, A KEY IS OHPRESSEI), THIS lANCtNT 
STRIKES THE STRING AT THE POINT HHFRE IT FORHS A WH brLEAVlNG THH REST OF THE S 

1""'5u1''t^'*I^*""'""'^* snwM) pRowicen is txiRiiMoiNARi^ mcm ul overtSne 

he I'JfJ"'*^ CLAVICWIRO OOES NOT EXISf REAPY-HAOE AS T OMEs ON "*"^"'"= 

THE PIANO AND HARPMCWJRP; IT IS FCRMEO ANU SHAPED <»Y THE FINCeR. AS^ A 

WITH THC KtJf THE FLAYER RETAINS CONTKOL OF THE SOUND* IMP CLAVtCMian t< T^Z i c 
«"Ts"H'^''iL'*?«'^AL%'«'A:''f: ^LL KEY^22G lis'THfrtNjV"!^ T^aS' i't " 

SLIWTeS? J;iitlWL*'?li?Et?loSsI'' '«*'«SNISSI0N UF HIS 

TH€ TONAL CAMCITV OF THf: CLAVICHCMD RANGES FROM A MrmFSr MPy^ncnarc 

2?Ts*:LL'KA5i?a\^'"^\\"i?.S^^^^^^ 2J.I2?:'«' S^iucJ'r'e'S^S: IsiHo ih 

2:--.^'"'" ""I WITHIN THIS NARROW RANCEt THE SUttTLETlES ANO MUAMC^C T 

'^^ELtrSlSiST^ cll s'^''-^ o^SHE PElwRSEr ' 

EMIfcLLISHNENTS CAN BE PLAYED CRISPLY ANU BRILLIANTLY. Shakfs. «m*p< 
APP066UTIIRAS, TRILLS, 10RNS. HOROtNTS, AnHlIOES ~ ALL SO m55«Te2|S^ 

XF'vSpifri'^.yjit! gmItest p.)polar7??*5!"'"" 

?N E«L^iIf2Tl2i;I*^«'??I!V!?i^I«?cP*"*'" ^'■*»'^* "HCHNESS OF TONE. 
TMB .1?- , J i^Ii"?*.*"^ CLAVICKIRDISTS AT THt INSTROHENT IT IS NOTED THAT 
lis 'JSJ 5 ** **** * •«*««ED DEPRESSION AT fHE KNUCKLES, THE HRIST IS VERY 

522.;^^,^"* APPEAR TO BE BARELY TOOCHINC THE KEYS. 

wiNof "* "C»«ICAL DIFFICOLIY IN PAINTING THE^NAN 

lCATEl*2N*T!^^'w^H^f?^^fcI « ?!J*.'i^'' instructions on clavichord technic. PREO 

iSS ^^R?2amy «llCHif5"'".f ,1*1 i^!'*"^^"' '"f'-''' *CTION IS SHALLOW 

•wo VIKIUALLY MEICHTLEii. IT 1$ A PHENUNENUN OF THE D006LE-EN0ED L^VFa iKuru a< 

wnf 'J;fJIi'^2^?c5^** '♦•^ '""^ PRODUCED BY A SWlifrNS FOR« "Tnl uNil^y 

WILL SOUND BETTER, SMEETER. ANO RICHER AT NAXIMUN LEVER LENCTH. FOR THIS 

11*^; T'nS2l*L2^/2^'5'-**"=*«"«'' '•*~»*" «*«SEO ALIKE. ARE PLA^^O 

Y URTSEchiiRv FM Sr??««"Si*l?-'*S2)L'-i'J^'*'"' ''"•C" IS USED ONL 

T MicNe NECCSMRY FOR OCTAVES OR FOR PROOUCINC A MUSICAL PHRASING. 

?JU£"*'JS?:«'**L'»«"'* 'S CONSIDERED A NOST USEFUL FINT.ER FOR PASSING UNDER THE SE 

mSsiJS ?S*Nnl6''s!??«."''**" " ''^'•^ NiRk ANO TO f"ilita« ^ " 

**^*-'* CUSTOMARY PIANIST'S HUMP. BUT THE KNUCKLES ARE 

* SPRINGY RESISTANCE MHEN THE KEY IS DEPRESSED. AND THE 
'5 DETERMINED 1Y THE SIKENClH OF THE ATTACK. THE FiScER 
IS PLACED ON THE KEY — NEVER ALLOWED TO STRIKE OR TO DROP -* lIlIH Su«ir if2t 

E"5?LL'so,'r'J5A%S*E'*M5 V!L,?f?.':r'? '".TL! * NSTrPUYE^.TH lis^FHc'lES 'fZrc 

ND^^'u^?R':?fNs;t*n.*pr^.rA ur^L?? N^i^^K^^ 

H^ll^'iT^^i^M" *"*""^ «.>iG«LY\"'^R,'o:''TH'E O^H^'HrNo'? FEWLY sJ2*TC 

FINGERS SHOULD LIE CLOSE TOGETHER. EXCEPT IN MAKINU LEAPS. UHtM tuf 
^^22 EXTENOEO IN KfcAOINniS FOR PAYING A NOTE AS UIUcSlY AS ISsI^SlE 

L"exFln^Al^o^^.^c^J<';!ly''21'.S?^^i5 avoid sm'A:j;,iS?l;r! ^f ENEHG^^iL 

I. c«t:i«i lUN AMU UN^ECE>i#KV FO^CEt A^fO ALLUW ONLY TMP IIIP iiilMt< riii. ciZf-t^^e 

TO HOVE. THE PROPER MOVFMKMT IS Pl.i^ING VhE ^^cJ^.W rHt i^Y T« P^^^^^ 

IN mt &M.UTH ACTIm. THEN HOLOI».G THIS STRIKING F0=.« IN FWULWRMM I^Sr jT^ 



OURATtm OF TMK TONE SO THAT THt SWINO UKS NOT WAVERt t.fc.t fclTHKR SMARC. OH FLA 

»f"ih FLUCTUATION OF MESiOM. THIS TCUCH IS 'Vuf^ilJIf^'/i^^l^.-n vTKime 

F0« IWICH INFLECTIONS A»F N«T OESIMO. IN HtUASINr, TMfc STRUKk 
• I»EKNITTINC TMC TIf OF THt FlNCKR TO CLIOS OFF IHfc KfcY ANO INTO THE 
»ALM UF THE HAM). IS USED WITH TO MttL«M6 THE UURATIOft Ul^ THE TIINF ANO 
TO TRANS^Fi THC WEIChT IF THE .•LAYING FlHCER 10 THK NEXT FINOfR TO RE OSEO. 
FURTtCWNURE. LEGATO IS ACHlfcVEO IN THIS NAVf TUCfeTHhR MllH bKfeAT TflNAL RICHNESS. 
?HrSLI0IN6"nWtNtNT MfiJKRWES ONE'S FEELINC OH HUM MCH NFIGKT TO 0S6 HIR THE 

NEXT WT6 THUS cSIaTLY FACILITATING THE TFOlHlCAL OfcNANOS t>f THE INSTRONfcNT. 
«CA0S6 «V THE SHIRT LENGTH OF THE NATURAL RfcYS ON THE CLAVICHORD, THIS 
%LIOING RELEASE IS EASY TO EFFECT ANO IT FEELS OUITE NATWAL. 
m N0« cUeSaTEO Ch#RACTERISTIC of the clavichord, AkO THE ONE THAT HAS 
GIVEN RISC TO THE HOST EXTRAORDINARILY EFFUSIVE FOETRY A*OUT THE INSTRUMENT, 
IS ITS AMLITY TO ALTER THE TUNE AFTER A SOUNO HAS BEEN FROUUC.EO. THIS ALTEHATI 
ON OCCURS NHEN THE-FINGCR HOLUING THE KEY IS GENTLY SHAREN. A SLIGHT AUOITlUN 

O^McUuRE INCREASES THE VOLUME («N0, CONSEUUENTLV, RAISES THE FITCH SLIGHTLY) 
, MIO A WWOCTiS OF ^REMOR^^ THE VOLOHE I ANO ALSO LONERS THt MTCM SL 

ICW^V). THE GENTLE FLUCTUATIONS FMOUCEO OY ROCKING THE KEY UF ANO UUHN, 
I^MT CNOMN SO m TM*^^ TMK STRING, NINUTELY ALTER THE STRING LENCT 

m TM MSULT IS A VIMIATO SINILAII TO TMT FOSSItLE ON A BUMEO STRINGEV 
VnSTWNENT. THIS TRENBLING (GERMAN "REBUNG", FRENCH "BAUNCENENT" , 

ITALuT "tRENOL^I H^S GIVEN THE CLAVICNORO ITS REFUTATION AS A SOULFUL, TENDER 

INSTRUNMT. ,^|(^|( VIBRATO IS A SLUR NllH DOTS OVER A SINGLE NOTE. THE 

BEST CPFCCT IS MHIEV» N»«N m Fl^ NITHHULOS ITS SHAKE UNTIL HALF THE VALUE 

OF TME NOTE HASFASSEO. THIS ANO THE FORTATO l«1RA6EN OER TONE") ARE CLOSELY 
RELATED, m DIFFERENCE BE NC SOLELY IN THE NUMBER OF TINES THE REY IS 
MESSED 4Jm T»« MN6ER STROKE. SUHETINES THE OIFFEREME IS ROUGHLY INDICATED 
BY ThTnUNBER OF DOTSOVER each NOTE, BUT BECAUSE THIS NUTATION MAS NOT * UNIFOR 
M CONVENTION. IT CANNOT B€ TAKEN AS A LITERAL INTERFRETATION OF THE "BEBUNC". 

A CLEMLY FERCEFTIBLE ACCENT, <>, ON EACH NOTE RESULTS WHEN THE TONE HAS 
JUST mE ADDITIONAL FRE»SURE ANO RELEASE, BUT A VIBRATO RESULTS MHEN TH€ KHf IS 
CLEARLY ROCKED. THIS TREATMENT IS BEST RESERVED FOR THE SLUWER, EXFRESSIVE FASS 
AGES. AND FOR THE MELODIC LINE, NOT THE ACCONFANIMENT. 

BMMEVrNRITING OF THE "BEIUNG" IN "THE FRESENT STATE UF MUSIC IN GERMANY". 

SAYS TIUS OF C.f'e! BACH'S FLiYWG IN HAMBURG IN 17721 "NHENEVER HE HAD A LONG 

NOTE TO EXFRESS. HE ABSOLUTELY CONTRIVED TO FROOUCE FROM HIS INSTRUMENT. A 
SILBCMANN UAVICHORD. A CRY OF SORROW AND COMFLAINT. SUCH AS CAN ONLY BE 
EFFECTED OFON THE CLAVICHORD ANO FERHAFS BY HIMSELF." 

iuSIC MRITTEN ESFECIALLY FOR THE CUVICHORD USUALLY HAD AN INTIMATE CHANACT 
BR. ANO IT MAS COHFOSED FRINCIFALLY BEFORE THE FIRST HALF OF THE IBTH 
CWTURY. OISFLAY MECES OF A VIRTUOSO CHARACTER ARE GENERALLY UNSUITED TO THE F 
CRSONAL DUALITIES OF THE CLAVICHORD. THERE IS A WEALTH UF LITERATURE. 
MAINLY OF LITTLE FIECE& SUCH A RONDOS. STUDIES. FANTASIAS. SONATAS. AND LIEOER. 

CRAMER SAYS THAT THE ESFECIAUY RENARRABLE FEATURES OF CLAVICHORD tWSIC ARE -FL 
UIOITY. SUSTAINED NELODV DIFFUSED WITH EVER-VARYING LIGHT ANO SHADOW. THE USE OF 

CERTAIN MUSICAL SHADING ANO ALMOST COHFLETE ABSTINENCE FROM PASSAGES WITH AHFEO 
GIOS, LEAFS, AND MOKEH CHOROSI ALL THESE ARE IN REEFING WITH THE CLAVICHORO." 

THE BEST FERIOO OF THE CLAVICHORO WAS BETWEEN 1753 AND IBOO, JUST BEFORi: 
THE FIANUFORTE EFFECTIVELY BEGAN TO RIVAL IT. 

C^OANIEL SCHOBART, THE FURFMOST FOFTIUL EULOGIST OF THE FERIOO, WROTE I 
N |7«$ IN iJiS " IDEE 10 EINER AESTHETIK OER T0NRONS1"» "THE CLAVICHORO, THAT IND 
IVIOUAL, NFLANCWILIC, INEXFRfeSSIVELY S-EFT INSTRUMENT, tl*| *!J**2'*"* 
OVER THE HARFSICHORD AND THE FORTEFIANO WHEN HADE BY A NASTkR OF HIS CRAFT. NOT 

ONLY MUSICAL COLORING, MIT ALSO THE MIDDLE TINTS, NOTES SWELLING ANO 
Oma AlJi;. NELTiSg tSiLLS, hardly breathing under JHf.flNMRS./ORTATO ANO 
VIBRATO, IN A HORD, EXFRESSIIIHS FOR EVERY SHADE OF FEELINC. ALL THIS CAN »E REF 
RnOUCtO AND CoITltlREO OF t.y THE FRE&SURE OF THt FINGER, THE VIBRATION ANO THROB O 
F THF STRINGS, ANO BY A TWJCH H6AVY OR GEM"--. THOSE »<HO «0 NOT LIRE BLUSTER, 
FRENZY ANO STORH, WHOSE HF.ART FINO.S FRCWIEN. ANO UfcLCUHE RELIEF IN THE 
OVERFLCi^rnF SWtET SENTIMENTS WILL FASS OVER THt HAKPSICHiiRO AnO CHOOSE A CLAVICH 
ORO." 
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A tAfMf ifSS I^FFIlSlVt: ArmrCtAlttlN US CHAKACICK Al^^lfAMCD \H UHHAH 
IH HV IHIIH IN HIS •»F^&AVS fJN MMlHOCU KlrVKlAHO tNSlKimfc9«IS~t 

CtAVIClffllia IS niSf lN€lllSHliU bV C€M?tf » SUfOlNG 10H¥9 ¥Hir,H IS litAHSf^mHkU 
IHIO A SOU? IIF «MHIN0«* (tV MIANS SUSfAIH^II AI«U INIItNSIfUO I^MkSSUMC 
WHICH IH^ IHACINAllVt AMIST UNHiKSIANOS KIH fU US^ HMlUlAHliVt AND 
E^rllCIAUV KMUHS Km If I iHlKUOflCi iNfCI AOAaiCI l«IV6M€HfS« fiM IHIS ll£A$l«| II 
tFNOS IfSClF l»A||flCUiAlllY ««€U 10 THt OUlHlimiHG llf II^AUflm f-CFLlNCS AM) 
HKASING HCtUUliS AMI 1^ l:SnCIAUV SUIfCO III CXMiSS IHI^ l|rHI»t:MHftlirAt OlS^USlY 
IIIIH Uf IH^ SUIIt WHICH IS SCI CiAlliV AWI IH SUCH CHAKMlHG WAYS NKVIiAtS lISCtF IH 
AOACIO NUVfcHlfliirS** 

CUVICHimnS am MAOaV AVAItAUte IOOAV Ftan HASIItll •UltU^MSt Husrtv 
(URO^EAN on iHGllSMt HMiSi^ lltSTIIIlN(e|||S HAVi IMEAf SIAHUm AM) t;.«llFOMMtV l^lftC 
OIAFISHAIISHI^. AH lOMt IHSIMIHCHI FOU SHAUt ^^RSONAtt iHlEHSCtV CK^AflVI: 
HUSIC HAKlHCf If IS THE iHSlKUNm IHAf EAUtV ^COAOOCUfS AtHAVS US€t> lU IMAIH 
IHEIR SlUOfHIS IH IHf f INC AS#EC?S Of HUSICIAHSHS^« ACCMOIHC 10 HAIIH€SUH» «IH 
OSIg WIO HANI fU JUOCC A Sl^HSIf IVE lOllCH ANO A Pm^ SlVti SHOUiO tiAO 
THilR CANIIIIIAfE 10 A fINHle CUVICHDAO** IN THE OflHION OF HAtlHERt «IHIS 
iHSfRMHIMT ISf SO 10 SNAKt EVEHV nAVEK*S EUHEHIAKV MANMM*« 

TOOAVf HANV KEViOAIIDlSTS WHO HI6HI WISH 10 ^EMKT THEIK lOUCH ANO 
TO IIEFINE THKIK SENSItltlTIES UUCHT TO f INO A KICH KEttAAO IN THE EXMESSIVE MSS 
UltlTIE$ Of THE CI.AVICH0IIO* THC SEEKEIIS HItl. SUREiV OlSCOVEfl IN THE CtAVICHOAO 
THE FEM^ECT COHrtEHCHT TO THEfN HOSICKNSHI^* 
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A^ENDIX C. SAMPtE ODTPOT OF OU HAN AND SEA AS fEODUCED BY 
HIKA, CAP/I, ta AND CGP. 



ERIC 



J 

] 



217 



> 



tc iC 



10 Ml 

5S 



E 2 < 



2$ 



MM M 
O O K 

UlUi < 

>5>< 



zz %z 

iC«r Z < 



S 2 - * * 



_ Z 

U w» O O 



2 ^ 



11 



eoz n 

--is 



i 

X 



K M ta < 



z z z n 
iC « U« 
ft. ^ zo 



I 



11 



ii 



MX 

z z 

I 
I 

t 

I 

I 



z zo M 



if : 



< < 



M M • IM 

zzzS 
« « B o 

iC iC 

uuoo 



ii ! 

Ml 
> 

ft. < 



S3 



5 



Z ^ Z 

o »*»*>z 

^ ZZ04 

< ***^< z 

X Z IM 

tt. zz^^ 
ft. z 

O Z ZZ IM 

I 

i u 

- M Si 

! 22 

I MM 



z 



MM 
OO 



ZZ 

j 
! 



m'm 
o,o 



• lUWI 

Z 2 Zft. Z 

M ft. z 

52 



X ti 

T 



Ml M'M IM 

Si 3;3 S 

i &S 



5 3 

6 SS 5 

< OO 



< o a 

O ZZ 



t 33 



C OO 



Mf iC « 

X 

^ OO 



S £1 



U 
ft. IM 

zzzn 

« «OM 
4Lft. ZO 



Oi M^sS 
X 



a 



ii 



aLZ 

n •* JC ^ • 
QO 



oa 



flC 

22 



i 



> 

01 

z 

M 
Ik 
IM 

o 



M « ac 

Z " ' 



i ss 

o z z 



nnft. z 

O *3iC M 

< <ft. K 



o z z* . 
CI uuoo 



z z 

Si; 



z 



213 



i] 



1 



il 



33L 

uuoo 



KM • < 

Si ! 

XX ! 



u 

d< 



ssj_ 
t ill 



ns*= 



K K 



-.5 8 



i 11 



& if 



S8i3 



33 



MIC « < 



X 



A. A. OA 
M « XO 



Xtf X") H 



si 



ills 

> 

m 

OOA..J 

I 

! 

if 



u 

K H 
ZX 



O M K lU 



5X>»H 
uo o 



s 



if 



itf it it 
X 

P OO 



A. X 



M A.^ X 4 



XX x«« 



XX 







m 


«. )t 


m 




M 


>> •« 












A. A. • A. 


o 






^ ^ >««« 



u 

XX 



X 



II 



9m -1 -» w 



oo 



«!2 



S 11 

5 fl! 



A, A. 



oo 



X 



X 

M 

X 



! g 



j o 



A. X 



« « « o 

>>> < 



A. X 

OO SO 

<<x< 



c 

X 



X- 

u 

X 



ss 

oo ^ 



tt. m'x 



II 



o o 



OO 



X Xf.t 

>>>o 

X X 2,-9 
A. *> ! 

> 

« 4C 

A. A. < 



> X 

H HO M 

X x<»* 



X X2M 
« « OO 
^ A.X < 

••••X X 

uuoo 



XX 



U 



*. > 

SCO 



S |S 

X xx 



XX 

T: 

lit 

S5S 



X A. 

X ooo 

H < < < 



X < <> 



X XX 

< a o 



XXX 



o 
u 

o 



ma. 
a o X 



u 
xs 



1 
1 
1 



I 



I 



^1 



I 
I 
I 
I 
I 
I 
I 
I 
I 

! 

i 



I 
! 
I 
I 



< 



O M 

i !t 
i t 

s * 

> } 

M t 
K • 

8 I 8 

M t 3 

- ' 5 



! S 
i 



$ S 

s • 



I ! 



if 



Q d 



oo j 

ssj_ 

i i 
if i 



u 

mi 



tt'm 



U 
si 



— ^ ae « u 



I 



I if 



Ml O 

8 • 



33 



mm 



< < 



I ! 
I I 

i 



S 
1 



< 3 



I I 



OO 



U 
M K 

II ! 



8! 

oo 

I 



u 
o 



m» ^ Ml 



> 



AMI 

>>< 



S S z s 



oo 



IH I . Ml 

M BB Ml <| 



Ml 

Ml! 



ill 8 



O 



u 

^ Ml 

83 



ii 
an 

oo 



u 



O HHW M 

X <<>m 



2 « 

SCO u 
** 2 

M mmxx 

M N 3 tM 

X OOX 



nnx X 



MM 
OO 
• mm 



5 



> 
< 

Ml 

X 

a 



X 

o 



i ii J 

Ml 



30 



Si 



X 
M 



^ Ml 



219 



oo « 

Ml Ml^ Ml 

si 



m m 



m m 

Ml Ml 

uooo 



o 



S SSSi! 

X < <>M 

3 ii 



Ml ae flc u 

X P^^MI 

^ oa < n 

OA 



' Ml 



^ Ml 

o A« a X 

^ 5« 



o o 



> 

Ml m,^ 
m KK c < 
Ml 3 3« ^ 
» < < > VI 

O 

I SSS3 

< < 



3 



i 

o 



22 



If 



oo 



X 



2 3 



iii' ' 



oo u* 



11 



«!« : 

— — 

333S 



< < > ' 



^ ^ «o 



X X z 



eg 



X X 



oo 



< Moo 



3 

o 



ii 

««x - 

OOX( 



in 5 



>» 

UI 

X 



o 

X 



K M 0 

.335 



XXX- 

§t Mat 



X 2K» 



KM 



XX 



-J 
m 



X 



22*" 



M M ia < 



X XK»» 
U UOO 

XX 




o 

X 



MM 
OO 



^ X 

' l» - 

> > < 

oe a( u 

tel Mrf 

X XX 

uuo o _ 



X X 



i 

s 



UI 

z 



o 



o 



^ UI 

X X 
X2X O 



« go 

xr 



XX 



> 



Sips 

if 



««|8 



««>o. 

« COM 
^^X O 

I I UlU 



C C UI 
^ ^ m 
c 

^ ^ ^ ^ 

Ui 

> 



& X 

A « «M 

« C O 
> >>< 



XX 

ao K 

OO 

X X. 



OO 



XX 

% i! 



-J 

z > > 



< S3 

X << 

UJ « « 

Z K 

K OO 



XX 



i 



f F 

I r 
ir 



I 1 



0 



I 

I 



if [ 



MM 



mm m 

m m 

1^ ^ 



S8 S 



u 

H H 

m m 

X X 



X 



8: 



X •» 

1 isis 



an 



W _ A, X 
> XX2W 



X 



s 

X 

a 
•i 

o 



is 

•.-.si 



X Ma: :* 



If H 

oo ^ 

5S 

mmxm 



xx< »• oo 



7i 



X 
m 



X 

A. ^ AMI 

mmmi9 



4 < 



8S » 

Is 



X XX 

s Si 



4 



* iSC A C 
>>>< 



|5 5 

oo 4 

my. 
TX X 

wl uoo 



4 < 



9 

O 



OO 
mm 



o 



o 

O 



if 



^^x 

OOQ 

< <x 



S i 



IS 

m -i 



M H 

< m 



SIS 



lERlC 



88 



O 

> 



o 
> 



iill 



if 



>> > « 



! i 



i i 



. 22 



z « I 



■I? 



C 4 



< < 



> 



S ii . 
§ SI 



s 

> 



«««« 



i ii 



is 



X AiA Aim 

s >> X 



— 8 



— S 



MM 



I5« 



> 

8 



1^ 



II I II 



fill 

iiis 



4 
>• 

mm mm 

>> >4| 

Ii 

mm SO 
A*, s < 



K 

Hi 



u 



>> 





3-,S« 




A 2 




OH 




2 2 








oo 


S 








m 


uooo 


i 


mm 9 < 




uu 
















u 

K H 












< < 




< < 
































MX 






































m 


i 


1^ 















m 




> 




MM M 




OO P» 


• 


mm 4| 


o 


^ 2 




mmm m 




mmmi3 




>>>< 



at K 2 22 

2 if 



«. 2 

a "» ")2 tu 
a cos o 
e 4 4 2 4 



ui mm 



f4 



22 



222 



> 

4 



• m 

if 

0< 



s 



ii 



2 2 



8 



Kit 
4 4 



.1 



er|c 



.1 



223 



I 



I 



I 
I 
I 



I 1 



i a 

11 



11 



88 



§ ssis 

m 

jj. IjoS 



Si; 



1 s 



I i 



8*^ 

Hit 



to 



KM 

33 



Mac 



AO 



5 33*5 



MX 



-J 



8 

I 3383 



uoo 



h)1 



88 

mm 



M >>>< 



n 1 II 



S 

o 



23|S 

44*4 



oo 



MM 

Si 



i ! 



I iiii 

O MM 



S 33S& 

...a 

: SS 
- i;s 

1 

I 



fERlC 



K 

ERIC 



rf 



1 mil 

1 Hll! 



08 9» 

mm < 

mMmm 

ac actus 
> >>M 



S 2 



ii 



oo2 



mm 
00 



I 



! ! 



I nil 



3 



mm 

m 

•» n X X 
uuoo 



Kit 



00 a 



; 

mm 

5! 



. •••• 



8 

m 

m 



> I 



o 



I 



MM #• 
00 

mm 4 

S8 



224 



4 



i 



] 
J 



.1 



225 



I t 



I 



II 



SSi3< 

SSxo, 

Is 

mmmx. 

>>o 

Ills 



I I 



. 88 



I 13 s 



m mm 

S S8 



»• i i Mint ^ 

5 55 



MM 

< < 

is 



; I 

I I 



88 S 

•»« 

— mm 
mi» 

X 



2 55*'* 



3 



88 

m m 

m 



m M 



3 

S 

o 

o 



M» 



MM 

oo 

Ul Ml 



M 

m 

I 

M . 



»8iS 

X 



m 






X 


i 






oo ! 






K» i 








<< 










SI • 


ss 




MM 





I 

I 11 H 

ui MM* M 
X MO 

8 it 



< 



M « Jt MO 
V* >>>< 



9» 

X MM 3a 
aas4 



Kit 



1 » 

§ ii 



< 

K 

M 

m 

§ 
S 

M 

ML 
X 



i 

4 



MM* 

Ik « 

OOX M 

A* 

M M 

M? 

53&S 



< < 



I I 



s 



MM 

o 



!8 938 

t HI 



9 22« 
X s x»» 
4 uu o 



H 



226 



2«M 
MM 

83 
55 



I I 

t it 



X X 

8S 



ui 
X 

o 



Kit 

ss 



X H 

5 Si*" 



>> > 4 



22 



i i 



J 

25 



4 



HHm < 
< 4 > VI 



X 

uuoo 



1^ 



s 3 

o 

ui S 2 2 «u 

X 5«eoo 
«e« 

H X S 

3 X Z»»^ 

A U U OO 

1 



§ if $ 



5 

8 iit 




*4 mm H 



X 



>>>< 



2 22W 

« 2 Oi* 

^2 t X 
2 im^ 
UUOO 



4 4 



«** 2 2 2 U< 
^ 2 < 



& S S «^ 2 
O 2 2 2M 

is 

ao 

o 

2 2 2H*» 

< ouoo 

Kit 



fill iifti i»« "5 



sssi 



i il 



o 



o 

2 
4 



002 < 



m. «c t 
oo^^j 

IW 1*1* 

X X 

22»*»r' 
uuoo 



in 



! i 
i i 



9 



ss 

**! 

II « 

ssif 



IK 



i - 



4 

X 



88 



i ! 

i 52 

4k 



n 



u 



32 



8 

5 



0 
m 

X 

« 
O 

5 

*o 



IS- 



Is 



88 



>< 

gg>« 



mm 
22 « 3t 



uuoo 



g. 

oa 



I 

ss 

t 

8 

I 

i w 



i it 



. 1 ill 

i 

2 W 



^ 32 

s 

X ^ ui 



a nnz X 
X xx»»»» 
< uuoa 



Si; 



^ If 



5 



oa , 

ssss 



^1 



i L 
32 : 

ao 



a; 
I 

m 

I 

X 
Z 



xx^H 
S^x < 

11 



22 

s^sS 

aa 



ss 



m S Xxi^ 

> 1 «93fi 



s^ 



s s| 

I L« 

o »a 
!K Ux 



8 



aa 

:l 

aa 

ss 

W 
tt 



ss 



X 

Ml 

1 ill 



.aa 

iSS 



s , 

I n 
\ w 



% 33 

X < 4 



g Si 

X XX 









X 


> 




X 


aa M 






X ac 




u 










Ml 










x^ a«y 


§ SS 


o 


SSS3 




















4 < 




II 


»• »> 






XX 






ilii 



ERIC 



r 
r 



i! 



S H 



Hi 



I 5513 
1 ».s 



; i 



8 fff\ 



1 1! 



2 ... 
ft CSfi < 99 



j i 



83 



r 



li 



Eg 



Ills i 8 



So 



& s 



i 



MM 



iS ! 



I 



23J 



> 

MM M 

OO 



if 



S3 o 

«<S8 



OO 



o 
o 

i 



> 



oo 



8S 



o 

e 



M O 0«M 



o 



^ >>>< 



o n X 

4 U030 



a. a. V 
«««o 
a. a. a. ^ 



K K SIM 



MM 

oo 



s if! 

o « 



' — 

i > 

« « o 
> & & < 

• Ik 
%lll 

S*. % *. z 
« « «lll 
Ik a. a. a. A 



o 



Ik III 



o»* 



o o 
< < 



M 3 

a a, AIM z 
or jc ic o»« 



S I 



I ^< 

I CO 



a z 

A 10 w 

oc ac 

> > X 



\ 

I 

oo 



3 a S S 
z z z 



a 
z 



zz 



«8 
•••• 

zz 



zz 

Si; 



>> t 

MM 



o 3 

KM 9 

z z 



< < 

z z 



z a 



X z 
> > 



! i 



231 



I I 



1 



I I 



MM 

oo 



z 



< 



< 

iu 
X 

o 











o 






>> >o 






a' 










> X 


»- 






MM < 



! < 

SSS3 



MM 



s 



s 

X 



OO 



< < > < 

ii!: 

Ul ^ 

32* * 
X 2^ »• 
oooo 



Z 2 



u 



Ul K Ul 

^ ills 



2 



%2 



2 < 



22 



o 



o 



MM 
OO 




MM 
OO 

mm 



— iii 



OO 



4*. 
2 «te 

M A.S 



> > 

« M 

Ul at « »• 
M ^> < 

& A « S 

O iC « xo 



3 



11 ^ 
is 



&8 



iC ic 5 • 



4 Z 

>>>< 



u 

KM 

< < t 



o 

i 

s 

o 
o 



if 



Mm M 
OO H 

>>>< 



9 
C 



Hi 



oo 



ZZ <H 



4 



ft. ^ Z U 



ft. H 

HHm< 



2 X Z •* 

ssis 



4 IK 
2 ZZ -I 
ft ^ Q A 
iCiC Z O 



I 



^ o^ 

S ii I 

z 3 

M Cft. 



u 

ft. ul 



zs z n 



..s 



zz 



ft. z 

3 2 Z Zltf 
5 S O 
> ft.ft 2 < 



4 CM 
222 2 
4 ft Cm 

«ae 2»* 



3 
O 



>> P» 

ft. 2 



220 4 



! X 



— S 



^ 2 

^ 2 iw 
2 d o 

U 2 < 



lAI U 

> 22 



K K 
< 4 



2 2 



2 2 

SO o <a 
22 O 

:i 

2 20 5 

MM < X 



ft. < 
4 4 > U 

2 ae X OO 

M ft ft < .J 



ss t: 



OO 



ft. 2 

iC « <C «3 

>>>< 



ft 2 < 



4 Ul 
2 2 2 2 

ft ft C M 

iC •X2 M 



O 
X X 

< < 

M M 
2 2 



OiO 



«t 111411 ; 

s: • 

I s 

O « fC 

ft. S 



Z MM 
^ OO 



tt 2 2 2 

5 ; " 

ce ft. 

< **** > 

u« 220 

2 < 



222 

w « « 3 
X fti a 2 



o 

2 



55S 



a 

X X 

< < 

M M 



! 3 



I 



233 



r 



Ir 
r 



Z 



MM 

OO 



^1 Itf It 



AX 



2 x«r "» 
O oe ic a as 
> ^ A z a 



3353 



k)1 

< < 



4S 



OA 

OO 



>> ^ « 



H X ar H 



w 

M 

«t 

CL m 

2^ Z C 

•e jc o H 
a A. z a* 



O 
Z 



z 



s 

z 



M 




M Z 

! S, 








t if 








ac at 



Z ZH»» 



.5 

a M « oi 
> >>< 



O O Z 

o o 

Z Z H H 



M 

o 



z 



£2 S 



o 



3 



! S 



8 

U 



z 



3 

O 

§ 



8 

U 

8 

S 

iw 

ai 
X 



3 

O 

o 
z 

M 

o 



o 

f 



9 

Ul 



11 



Si 

z 



:SS 



St ec 



3 d: 



XJCOH 
M <L C Z U* 



o 

X X 



z z 



zzz n 
«.«.z o 



m 

& Ul 

ttX 



u 

M X 



S §,S 



o 
z 



«'flCZ 



3nx 
z H 

uu o 





-« 




UJ 


^ Ul 




X 


X X a z 


AC 


Ui 


3355 


Ul 

> 








¥t 




z 


Ul 


<k Ul 




Z 


a a a X 


M 


oe ae ee H 






> > >o 






Ul 


o 
a 




> 








Ul 






«i 


o 


< 


o 


41 




z 


•i 




X 


M 


a a a Ul 




o 


« o 




u 


> >>< 


Ul 






Ml 




z z 




Ui 


O O H 


o 


X 


ZZ CLZ 


z 




Z Ul 


M 




c o 


•i 






Ul 


% 




kU 




OO 










Ul 






iC 


a. Z 




ai 


Z Z Z w 


o 


WH 


A. O ^ 


z 


« J( z < 





zz 

H H O 

OO z 
OO 



z 

r zo 
I a z 



•» •» z 

zz H 



ERIC 



234 



APPENDIX D 



1 
1 
1 
1 
1 
1 
1 
J 
j 
I 
] 
j 

1 
.1 



f 



235 



APPENDIX D. DETAILED ERROR ANALYSIS OF DOCUMENTS ANALYZED BY MYRA. 

Two types of error are defined for this analysis: type A and 
type B. Type A errors are those errors generated by problems in the 
conqputer program, and do not reflect weaknesses in the rules. Type 
B errors are inaccuracies caused by a grammatical assignment rule 
described In Chapter III. 
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